Using an MM-principle to enforce a sparsity constraint on fast image data estimation from large image data sets

ABSTRACT

The mathematical majorize-minimize principle is applied in various ways to process the image data to provide a more reliable image from the backscatter data using a reduced amount of memory and processing resources. A processing device processes the data set by creating an estimated image value for each voxel in the image by iteratively deriving the estimated image value through application of a majorize-minimize principle to solve a maximum a posteriori (MAP) estimation problem associated with a mathematical model of image data from the data. A prior probability density function for the unknown reflection coefficients is used to apply an assumption that a majority of the reflection coefficients are small. The described prior probability density functions promote sparse solutions automatically estimated from the observed data.

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional application No. 61/835,579, filed Jun. 15, 2013 and U.S. Provisional application No. 61/835,580, filed Jun. 15, 2013, each of which is incorporated by reference in its entirety herein.

GOVERNMENT LICENSE RIGHTS

This invention was made with government support under contract number W911NF-1120039 awarded by US Army Research Laboratory and the Army Research Office. The government has certain rights in the invention.

FIELD OF THE INVENTION

This invention relates generally to image data processing and more specifically to applying the majorize-minimize mathematical principle to achieve fast image data estimation for large image data sets.

BACKGROUND

Half of the coalition forces casualties in the Iraq and Afghanistan wars are attributed to land mines and improvised explosive devices (IEDs). Consequently, a critical goal of the US Army is to develop robust and effective land-mines/IED detection systems that are deployable in combat environments. Accordingly, there is a desire to create robust algorithms for sub-surface imaging using ground penetrating radar (GPR) data and thus facilitate higher IED detection rates and lower false alarm probabilities.

Referring to the example schematic of FIG. 1, a GPR imaging system transmits signals from an above ground transmitter 102 into the ground of a scene-of-interest (SOI) 104. Signals that are reflected off of objects 106, 108, and 110 in the SOI 104 are received by one or more receivers 112 to generate images that convey relevant information about the objects 106, 108, and 110 (also known as scatters) within the SOI 104. As a transmitted pulse propagates into a SOI 104, reflections occur whenever the pulse encounters changes in the dielectric constant (∈_(r)) of the material through which the pulse propagates. Such a transition occurs, for example, when the radar pulse moving through dirt encounters a metal object such as an IED. The strength of a reflection due to a patch of terrain can be quantified by its reflection coefficient, which is proportional to the overall change in dielectric constant within the patch.

In principle, GPR imaging is well-suited for detecting IEDs and land mines because these targets are expected to have much larger dielectric constants than their surrounding material, such as soil and rocks. It should be noted that for a high frequency transmission pulse (i.e., greater than 3 MHz), the backscattered signal of a target can be well approximated as the sum of the backscattered signals of individual elementary scatterers.

The phrase GPR image reconstruction refers to the process of sub-dividing a SOI into a grid of voxels (i.e., volume elements) and estimating the reflection coefficients of the voxels from radar-return data. Existing image formation techniques for GPR datasets include the delay-and-sum (DAS) or backprojection algorithm and the recursive side-lobe minimization (RSM) algorithm.

The DAS algorithm is probably the most commonly used image formation technique in radar applications because its implementation is straightforward. The DAS algorithm simply estimates the reflectance coefficient of a voxel by coherently adding up, across the receiver-aperture, all the backscatter contributions due to that specific voxel. Although the DAS algorithm is a fast and easy-to-implement method, it tends to produce images that suffer from large side-lobes and poor resolution. The identification of targets with relatively small radar cross section (RCS) is thus difficult from DAS images because targets with large a RCS produce large side-lobes that may obscure adjacent targets with a smaller RCS.

The RSM algorithm is an extension of the DAS algorithm that provides better noise and side-lobe reduction, but no improvement in image resolution. Moreover, results from the RSM algorithm are not always consistent. This may be attributed to the algorithm's use of randomly selected apertures or windows through which a measurement is taken. The requirement for a minimum threshold for probability detection and false alarms would make it difficult to use the RSM algorithm in practical applications.

Both the DAS and RSM algorithms fail to take advantage of valuable a-priori or known information about the scene-of-interest in a GPR context, namely sparsity. More specifically, because only a few scatterers are present in a typical scene-of-interest, in other words most of the backscatter data is zero, it is reasonable to expect better estimates of the reflectance coefficients when this a-priori sparsity assumption is incorporated into the image formation process.

Several linear regression techniques for sparse data set applications are known. Algorithms for sparse linear regression can be roughly divided into the following categories: “greedy” search heuristics, iterative re-weighted linear least squares algorithms, and linear inversion and deconvolution via l_(p)-regularized least-squares.

“Greedy” search heuristics such as projection pursuit, orthogonal matching pursuit (OMP), and the iterative deconvolution algorithm known as CLEAN comprise one category of algorithms for sparse linear regression. Although these algorithms have relatively low computational complexity, regularized least-squares methods have been found to perform better than greedy approaches for sparse reconstruction problems in many radar imaging problems. For instance, the known sparsity learning via iterative minimization (SLIM) algorithm incorporates a-priori sparsity information about the scene-of-interest and provides good results. However, its high computational cost and memory-size requirements may make it inapplicable in real-time settings.

Another known approach to sparse linear regression is the iterative re-weighted linear least-squares (IRLS), where the solution of the mathematical l₁-minimization problem is given by solving a sequence of re-weighted l₂-minimization problems. A conceptually similar approach is to compute the l₀-minimization by solving a sequence of re-weighted l₁-minimization problems.

Still another known approach to sparse linear regression are the linear inversion and deconvolution via l_(p)-regularized least-squares (LS) methods. In these methods, the reflection coefficients are estimated using

$\begin{matrix} {{\hat{x} = {{\underset{x}{argmin}{{y - {Ax}}}_{2}^{2}} + {\lambda{x}_{p}^{p}}}},} & (1) \end{matrix}$ where λ is the regularization parameter. l₁-regularization (i.e., p=1) incorporates the sparsity assumptions by approximating the minimum l_(o) problem, which is to find the most sparse vector that fits the data model. Directly solving the l₀-regularization problem is typically not even attempted because it is known to be non-deterministic polynomial-time hard (NP-hard), i.e., very processing intensive to solve. To date, l₁-regularization has been the recommended approach for sparse radar image reconstruction.

So called l₁-LS algorithms incorporate the sparsity assumption, generally give acceptable results, and could be made reasonably fast via speed-up techniques or parallel/distributed implementations. LS-based estimation can, however, be ineffective and biased in the presence of outliers in the data. This is a particular disadvantage, however, because in practical settings, the presence of outliers in measurements is to be expected.

More specifically, the l₁-LS estimation method has been known for some time, wherein the concept has been popularized in the statistics and signal processing communities as the Least Absolute Selection and Shrinkage Operator (LASSO) and Basis Pursuit denoising, respectively. A number of iterative algorithms have been introduced for solving the l₁-LS estimation problem. Classical approaches use linear programming or interior-point methods. However, in many real-world and large scale problems, these traditional approaches suffer from high computational cost and lack of estimation accuracy. Heuristic greedy alternatives like Orthogonal Matching Pursuit and Least Angle Regression (LARS) have also been proposed. These algorithms are also likely to fail when applied to real-world, large-scale problems. Several other types of algorithms for providing l₁-LS estimates exist in the literature and others continue to be proposed.

Some of the shortcomings of the DAS and RSM algorithms may be attributed to the fact that their data model does not take into consideration known prior information about SOIs. Since only a few scatterers are present in a typical SOI, better reflection coefficient estimates can be expected when the a-priori sparsity assumption is incorporated into the model. The SLIM algorithm produces good results by incorporating the assumption the SOIs are sparsely populated by scatterers. However, its high computational cost may make the SLIM algorithm impractical for large-scale real time applications. The class of l₁-regularized least squares (l₁-LS) algorithms have been recommended for radar imaging where sparse solutions are expected. This class of algorithms have been found to perform well in a number of applications, such as machine learning and neuroimaging. In off-line applications, where training data is attainable, generalized cross-validation or similar techniques can be used to obtain optimal or near-optimal regularization parameter. However, in real-time on-line applications such as GPR imaging, there is no straightforward way to choose an appropriate regularization parameter. It may therefore be difficult to effectively take advantage of l₁-LS algorithms in GPR imaging problems or other real-time applications.

SUMMARY

Generally speaking and pursuant to these various embodiments, the mathematical majorize-minimize principle is applied in various ways to process the image data to provide a more reliable image from the backscatter data using a reduced amount of memory and processing resources. In one approach, a processing device processes the data set by creating an estimated image value for each voxel in the image by iteratively deriving the estimated image value through application of a majorize-minimize principle to solve a maximum a posteriori (MAP) estimation problem associated with a mathematical model of image data from the data. As part of this approach, a prior probability density function for the unknown reflection coefficients is used to apply an assumption that a majority of the reflection coefficients are small. The prior probability density function used to apply an assumption that a majority of the reflection coefficients are small can be applied in a variety of ways. So configured, the described approaches produce sparse images with significantly suppressed background noise and sidelobes. The described prior probability density functions promote sparse solutions automatically estimated from the observed data.

The application of the majorize-minimize principle can be further optimized for the GPR context by accounting for a symmetric nature of a given radar pulse, accounting for similar discrete time delays between transmission of a given radar pulse and reception of reflections from the given radar pulse, and accounting for a short duration of the given radar pulse. Application of these assumptions results in a relatively straight forward algorithm that can produce higher quality images while using reduced memory and processing resources.

Accordingly, the above methods use the particularized data collected using transmitters and receivers to output images representing objects in a SOI. Devices, including various computer readable media, incorporating these methods then provide for display of image data using reduced processing and memory resources and at increased speed as processed according to these techniques.

These and other benefits may become clearer upon making a thorough review and study of the following detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

The above needs are at least partially met through provision of the methods and apparatuses for receiving and processing image data as described in the following detailed description, particularly when studied in conjunction with the drawings, wherein:

FIG. 1 comprises a schematic of operation a prior art GPR system;

FIG. 2 comprises a schematic of an example radar system as configured in accordance with various embodiments of the invention;

FIG. 3 comprises a perspective view of an example implementation of a prior art ultra wide-band (UWB) synchronous reconstruction (SIRE) radar system;

FIG. 4 comprises a schematic demonstrating a prior art approach to obtaining initial data from a SOI;

FIG. 5 comprises a graph demonstrating application of the mathematical majorize-minimize principle;

FIG. 6 comprises a graph illustrating a Laplacian prior density function (pdf) and a Laplacian-like prior density function as configured in accordance with various embodiments of the invention;

FIG. 7 comprises a graph illustrating a Butterworth prior density function as configured in accordance with various embodiments of the invention;

FIG. 8 comprises a flow diagram of an example algorithm applying the M-M principle to a MAP algorithm using various prior density functions as configured in accordance with various embodiments of the invention;

FIG. 9 comprises a graph of simulated GPR data used to evaluate various approaches described herein;

FIG. 10 comprises a displayed image derived from the data of FIG. 9 using a prior art DAS method;

FIG. 11 comprises a displayed image derived from the data of FIG. 9 using a MAP method as configured in accordance with various embodiments of the invention;

FIG. 12 comprises a graph displaying receiver operating curves for the prior art DAS method and a MAP method described herein as applied, respectively, to create the images of FIGS. 10 and 11;

FIG. 13 comprises a graph displaying a zoomed in portion of the graph of FIG. 12;

FIG. 14 comprises a displayed image derived from the real ARL data using a prior art DAS method;

FIG. 15 comprises a displayed image derived form the real ARL data using a MAP method as configured in accordance with various embodiments of the invention;

FIG. 16 comprises a displayed image derived from a set of simulated data using a prior art DAS method;

FIG. 17 comprises a displayed image of the objects in the SOI of FIG. 16, in this case using image data processed according to prior art RSM algorithm;

FIG. 18 comprises a displayed image of the objects in the SOI of FIG. 16, in this case using image data processed according to an LMM algorithm;

FIG. 19 comprises a displayed image of the objects in the SOI of FIG. 16, in this case using image data processed according to an MAP algorithm using a Jeffreys' prior as configured in accordance with various embodiments of the invention;

FIG. 20 comprises a displayed image of the objects in the SOI of FIG. 16, in this case using image data processed according to an MAP algorithm using a Laplacian-like prior as configured in accordance with various embodiments of the invention;

FIG. 21 comprises a displayed image of the objects in the SOI of FIG. 16, in this case using image data processed according to an MAP algorithm using a Laplacian prior as configured in accordance with various embodiments of the invention;

FIG. 22 comprises a displayed image derived from a set of simulated data using a prior art DAS method;

FIG. 23 comprises a displayed image of the objects in the SOI of FIG. 22, in this case using image data processed according to prior art RSM algorithm;

FIG. 24 comprises a displayed image of the objects in the SOI of FIG. 22, in this case using image data processed according to an LMM algorithm;

FIG. 25 comprises a displayed image of the objects in the SOI of FIG. 22, in this case using image data processed according to an MAP algorithm using a Butterworth prior as configured in accordance with various embodiments of the invention;

FIG. 26 comprises a displayed image derived from a second set of ARL data using a prior art DAS method;

FIG. 27 comprises a displayed image of the objects in the SOI of FIG. 26, in this case using image data processed according to prior art RSM algorithm;

FIG. 28 comprises a displayed image of the objects in the SOI of FIG. 26, in this case using image data processed according to an LMM algorithm;

FIG. 29 comprises a displayed image of the objects in the SOI of FIG. 26, in this case using image data processed according to an MAP algorithm using a Butterworth prior as configured in accordance with various embodiments of the invention.

Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions and/or relative positioning of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of various embodiments of the present invention. Also, common but well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted to facilitate a less obstructed view of these various embodiments. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. It will also be understood that the terms and expressions used herein have the ordinary technical meaning as is accorded to such terms and expressions by persons skilled in the technical field as set forth above except where different specific meanings have otherwise been set forth herein.

DETAILED DESCRIPTION

Referring now to the drawings, and in particular to FIG. 2, an illustrative apparatus that is compatible with many of these teachings will now be presented. In a GPR application, a vehicle 202 includes a plurality of radar transmission devices 204 and 206 mounted on the vehicle 202. The radar transmission devices 204 and 206 are configured to transmit radar pulses 208 and 210 into a scene of interest 212. A plurality of J radar reception devices 214 are mounted on the vehicle 202 and configured to detect magnitude of signal reflections from the scene of interest 212 from the radar pulses 208 and 210. The vehicle 202 can be any structure able to carry the radar transmitters and receivers to investigate a scene of interest.

FIG. 3 illustrates one example implementation: a truck 302 mounted ultra wide-band (UWB) synchronous reconstruction (SIRE) radar system developed by the US Army Research Laboratory (ARL) in Adelphi, Md. This system includes a left transmit antenna 304, a right transmit antenna 306, and 16 receive antennas 314. Other systems may have different numbers of and arrangement of transmit and receive antennas. The transmit and receive antennas are mounted to a support structure 321, which supports these elements on the truck 302. The truck 302 drives through a scene of interest while the transmit antennas 304 and 306 alternately transmit radar pulses and the receiving antennas 314 receive reflections of the transmitted radar pulses from backscatter objects in the scene of interest.

Referring again to the example of FIG. 2, the positions along the vehicle's 202 path at which a radar pulse is transmitted are referred to as the transmit locations, I. As the vehicle 202 moves, the two transmit antennas 204 and 206 alternately send respective probing pulses 208 and 210 toward the SOI 212, and the radar-return profiles reflected from the SOI 212 are captured by multiple receive antennas 214 at each transmit location.

A location determination device 220 detects the location of the vehicle 202 at times of transmission of the radar pulse from the plurality of radar transmission devices 204 and 206 and reception of the signal reflections by the radar reception devices 214. In one example, the location determination device is a global positioning system (GPS) device as commonly known and used, although other position determination devices can be used. Accordingly, the positioning coordinates of the active transmit antenna 204 or 206 and all the receive antennas 214 are also logged. When using the UWB SIRE system of FIG. 3, there is typically a minimum range of detection of objects in the SOI from the vehicle 2020 of about 8 meters, a maximum range of about 34 meters, and a cross-range of about 25 meters. The UWB SIRE system uses a FORD EXPLORER as the vehicle 202 such that the transmit antennas 204 and 206 have about a two meter separation between 16 receivers.

In one approach, the vehicle 202 includes a processing device 242 in communication with the location determining device 220, the transmit antennas 204 and 206, and the receivers 214 to coordinate their various operations and to store information related to their operations in a memory device 244. Optionally, a display 246 is included with the vehicle 202 to display an image related to the data received from the scanning of the scene of interest 212.

Due to the large size of the scene-of-interest, an initial data set is not generated by processing all voxels at once. Such an image would have cross-range resolution that varies from the near-range to the far-range voxels. The voxels in the near-range would have much larger resolution than those for the far-range ones. To create GPR images with consistent resolution across the scene-of-interest, we use the mosaicking approach discussed in L. Nguyen, “Signal and Image Processing Algorithms for the U.S. Army Research Laboratory Ultra-wideband (UWB) Synchronous Impulse Reconstruction (SIRE) Radar,” ARL Technical Report, ARL-TR-4784, April 2009, which is incorporated by reference and described with reference to FIG. 4. The steps taken to produce a complete image of the scene-of-interest (using the mosaicking approach) are described as follows. The image space associated with the scene-of-interest is divided into 32 sub-images of size 25×2 m². Each sub-image has 250 voxels in the cross-range direction and 100 voxels in the down-range direction. Thus, each sub-image has L=25000 voxels. The aperture (meaning the distance over which the vehicle travels while accumulating data for a SOI) is divided into 32 sub-apertures corresponding to separate, overlapping distances traveled by the vehicle, where adjacent sub-apertures (or vehicle travel windows) overlap by approximately 2 meters. Each sub-aperture has 43 transmit locations and is approximately of size 12×2 m². The radar-return and location positioning measurements associated with a sub-aperture are used to estimate the reflectance coefficients for the corresponding sub-image. The reconstructed sub-images are merged together to obtain the complete image of the scene-of-interest.

In another approach, referring again to FIG. 2, a separate computing device 260 may receive an initial data set from the vehicle based system to further process to create and optionally display images related to the SOI. The computing device 260 will typically include a processing device 262 in communication with a memory 264 to allow for processing the data according to any of the methods described herein. A display 266 may be included with or separate from the computing device 260 and controlled to display images resulting from the processing of the data received from the vehicle 202. Those skilled in the art will recognize and appreciate that such processing devices 242 and 262 can comprise a fixed-purpose hard-wired platform, including, for example, parallel processing devices, or can comprise a partially or wholly programmable platform. All of these architectural options are well known and understood in the art and require no further description here. Moreover, the memory devices 244 and 264 may be separate from or combined with the respective processing devices 242 and 262. Any memory device or arrangement capable of facilitating the processing described herein may be used.

With respect to the collection of data, consider a single scatterer, with spatial position p_(s), located at the center of a voxel (i.e., volume element) within the SOI. The spatial positions of the active transmit antenna and a receive antenna are denoted by p_(t) and p_(r), respectively. If the contributions of measurement noise are momentarily ignored, the relationship between the transmitted signal p(t) and the received signal g_(s)(t) can be modeled as g _(s)(t)=α_(s) ·p(t−τ(p _(s) ,p _(t) ,p _(r)))·x _(s),  (2) where x_(s) is the reflection coefficient of the voxel, τ(p_(s), p_(t), p_(r)) is the time it takes for the pulse to travel from the transmit antenna to the scatterer and back to the receive antenna, and α_(s) is the attenuation the pulse undergoes along the round-trip path.

The single scatterer model in (2) can be generalized to describe all the measurements captured by the SIRE GPR system. The SOI is subdivided into a rectangular grid of L voxels and the unknown reflection coefficient at the l^(th) voxel is denoted by x_(l). Extending the model in (2) to the SIRE system, the output of the j^(th) receive antenna at the i^(th) vehicle-stop is given by

$\begin{matrix} {{{s_{ij}(t)} = {{\sum\limits_{l = 1}^{L}\;{\alpha_{ijl} \cdot {p\left( {t - \tau_{ijl}} \right)} \cdot x_{l}}} + {w_{ij}(t)}}},{i = 1},2,\ldots\mspace{14mu},I,{j = 1},2,\ldots\mspace{14mu},{J.}} & (3) \end{matrix}$

In this equation, τ_(ijl) is the time it takes for the transmitted pulse to propagate from the active transmit antenna at the i^(th) transmit location to the l^(th) voxel and for the backscattered signal to return to the j^(th) receive antenna. The parameter τ_(ijl) is given by

$\begin{matrix} {{\tau_{ijl} = \frac{d_{il} + d_{ikl}}{c}},} & (4) \end{matrix}$ where d_(il) denotes the distance from the active transmit antenna at the i^(th) transmit location to the l^(th) voxel, d_(ijl) denotes the return distance from the l^(th) voxel to the i^(th) receive antenna when the truck is at the i^(th) transmit location, and c is the speed of light.

The notation α_(ijl) is the propagation loss that the transmitted pulse undergoes as it travels from the active transmit antenna at the i^(th) transmit location to the l^(th) voxel and back to the j^(th) receive antenna. The parameter α_(ijl) is given by

$\begin{matrix} {\alpha_{ijl} = {\frac{1}{d_{il} \cdot d_{ijl}}.}} & (5) \end{matrix}$ The notation w_(ij)(t) represents the noise contribution.

The above mathematical model defined in (3) is continuous whereas, in practice, the SIRE GPR system only stores discrete and separate sampled versions of the return signals. Thus, we introduce the following discrete-time signals to adpat the above model to the real world application: for i=1, 2, . . . , I, j=1, 2, . . . , J and n=0, 1, . . . , N−1, y_(ij)[n]

s_(ij)(nT_(s))  (6) e_(ij)[n]

w_(ij)(nT_(s))  (7) where T_(s) is the sampling interval and N is the number of samples per radar return. From (6) and (7), we can express y_(ij)[n] as

$\begin{matrix} {{y_{ij}\lbrack n\rbrack} = {{\sum\limits_{l = 1}^{L}\;{x_{l}\alpha_{ijl}{p\left( {{nT}_{s} - \tau_{ijl}} \right)}}} + {e_{ij}\lbrack n\rbrack}}} & (8) \end{matrix}$ and write the corresponding system of equations

$\begin{matrix} {{y_{ij}\lbrack 0\rbrack} = {{x_{1}\alpha_{{ij}\; 1}{p\left( {{0 \cdot T_{s}} - \tau_{{ij}\; 1}} \right)}} + {x_{2}\alpha_{{ij}\; 2}{p\left( {{0 \cdot T_{s}} - \tau_{{ij}\; 2}} \right)}} + \ldots + {x_{L}\alpha_{ijL}{p\left( {{0 \cdot T_{s}} - \tau_{ijL}} \right)}} + {e_{ij}\lbrack 0\rbrack}}} & (9) \\ {{{y_{ij}\lbrack 1\rbrack} = {{x_{1}\alpha_{{ij}\; 1}{p\left( {{1 \cdot T_{s}} - \tau_{{ij}\; 1}} \right)}} + {x_{2}\alpha_{{ij}\; 2}{p\left( {{1 \cdot T_{s}} - \tau_{{ij}\; 2}} \right)}} + \ldots + {x_{L}\alpha_{ijL}{p\left( {{1 \cdot T_{s}} - \tau_{ijL}} \right)}} + {e_{ij}\lbrack 1\rbrack}}}\mspace{20mu}\vdots} & (10) \\ {{y_{ij}\left\lbrack {N - 1} \right\rbrack} = {{x_{1}\alpha_{{ij}\; 1}{p\left( {{\left( {N - 1} \right) \cdot T_{s}} - \tau_{{ij}\; 1}} \right)}} + {x_{2}\alpha_{{ij}\; 2}{p\left( {{\left( {N - 1} \right) \cdot T_{s}} - \tau_{{ij}\; 2}} \right)}} + \ldots + {x_{L}\alpha_{ijL}{p\left( {{\left( {N - 1} \right) \cdot T_{s}} - \tau_{ijL}} \right)}} + {e_{ij}\left\lbrack {N - 1} \right\rbrack}}} & (11) \end{matrix}$ This system of equations can be written in matrix form as y _(ij) =A _(ij) x+e _(ij)  (12) where the L×1 vector x, and N×1 vectors y_(ij) and e_(ij) are defined to be

$\begin{matrix} {{x = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{L} \end{bmatrix}},{e_{ij} = \begin{bmatrix} {e_{ij}\lbrack 0\rbrack} \\ {e_{ij}\lbrack 1\rbrack} \\ \vdots \\ {e_{ij}\left\lbrack {N - 1} \right\rbrack} \end{bmatrix}},{y_{ij} = {\begin{bmatrix} {y_{ij}\lbrack 0\rbrack} \\ {y_{ij}\lbrack 1\rbrack} \\ \vdots \\ {u_{ij}\left\lbrack {N - 1} \right\rbrack} \end{bmatrix}.}}} & (13) \end{matrix}$ The matrix A_(ij) is an N×L matrix that contains shifted and scaled versions of the transmitted pulse. From (9)-(11), the matrix A_(ij) is defined to be

$\begin{matrix} {A_{ij}\overset{\bigtriangleup}{=}\begin{bmatrix} {\alpha_{{ij}\; 1}{p\left( {{0 \cdot T_{s}} - \tau_{{ij}\; 1}} \right)}} & {\alpha_{{ij}\; 2}{p\left( {{0 \cdot T_{s}} - \tau_{{ij}\; 1}} \right)}} & \ldots & {\alpha_{{ij}\; L}{p\left( {{0 \cdot T_{s}} - \tau_{{ij}\; L}} \right)}} \\ {\alpha_{{ij}\; 1}{p\left( {{1 \cdot T_{s}} - \tau_{{ij}\; 1}} \right)}} & {\alpha_{{ij}\; 2}{p\left( {{1 \cdot T_{s}} - \tau_{{ij}\; 1}} \right)}} & \ldots & {\alpha_{{ij}\; L}{p\left( {{1 \cdot T_{s}} - \tau_{{ij}\; L}} \right)}} \\ \vdots & \vdots & \ddots & \; \\ {\alpha_{{ij}\; 1}{p\begin{pmatrix} {\left( {N - 1} \right) \cdot} \\ {T_{s} - \tau_{{ij}\; 1}} \end{pmatrix}}} & {\alpha_{{ij}\; 2}{p\begin{pmatrix} {\left( {N - 1} \right) \cdot} \\ {T_{s} - \tau_{{ij}\; 2}} \end{pmatrix}}} & \ldots & {\alpha_{{ij}\; L}{p\begin{pmatrix} {\left( {N - 1} \right) \cdot} \\ {T_{s} - \tau_{{ij}\; L}} \end{pmatrix}}} \end{bmatrix}} & (14) \end{matrix}$ We concatenate the sampled data vectors {y_(ij)} for all transmitter and receiver pairs to obtain the K×1 data vector y y

[y₁₁ ^(T),y₁₂ ^(T), . . . ,y_(1J) ^(T),y₂₁ ^(T),y₂₂ ^(T), . . . ,y_(2J) ^(T), . . . ,y_(I1) ^(T),y_(I2) ^(T), . . . ,y_(IJ) ^(T)]^(T)  (15) where K=IJN. Extending (12) to account for all transmitter and receiver pairs yields the desired model y=Ax+e  (16) where A is a known K×L matrix given by A

[A₁₁ ^(T),A₁₂ ^(T), . . . ,A_(1J) ^(T),A₂₁ ^(T),A₂₂ ^(T), . . . ,A_(2J) ^(T), . . . ,A_(I1) ^(T),A_(I2) ^(T), . . . ,A_(IJ) ^(T)]^(T)  (17) and e

[e₁₁ ^(T),e₁₂ ^(T), . . . ,e_(1J) ^(T),e₂₁ ^(T),e₂₂ ^(T), . . . ,e_(2J) ^(T), . . . ,e_(I1) ^(T),e_(I2) ^(T), . . . ,e_(IJ) ^(T)]^(T) is a K×1 noise vector that is assumed to be zero mean Gaussian white noise with variance σ².

Given the transmitted pulse p(t) and observed data y, the problem is to estimate the reflectance coefficient vector x. Note, the time delays {τ_(ijl)} and attenuation values {α_(ijl)} are computed using (4) and (5), respectively, and the geometry defined by the chosen SOI and the locations of the transmit and receive antennas.

The Majorize-Minimize Principle

The MM (which stands for majorize-minimize in minimization problems, and minimize-majorize in maximization problems) principle is a prescription for constructing solutions to optimization problems. An MM algorithm minimizes an objective function by successively minimizing, at each iteration, a judiciously chosen objective function that is known as a majorizing function. Whenever a majorizing function is optimized, in principle, a step is taken toward reaching the minimizer of the original objective function. A brief summary of the MM principle is now given with reference to FIG. 5.

Let ƒ be a function to be minimized over some domain D∈

^(L), i.e., the function's minimum value is to be found within the given domain. A real value function g with domain D×D is said to majorize ƒ if g(x,y)≧ƒ(x) for all x,y∈D  (18) g(x,x)=ƒ(x) for all x∈D.  (19)

Suppose the majorizing function g is easier to minimize than the original objective function ƒ. Then, the MM algorithm for minimizing ƒ is given by

$\begin{matrix} {{x^{({m + 1})} = {\underset{x\;\varepsilon\; D}{\arg\;\min}\mspace{14mu}{g\left( {x,x^{(m)}} \right)}}},} & (20) \end{matrix}$ where x^((m)) is the current estimate for the minimizer of ƒ. The algorithm defined by (20), which is illustrated in FIG. 5, is guaranteed to monotonically decrease the objective function ƒ with increasing iteration. In other words, a further minimal or smaller value for the function ƒ is obtained with each iteration of the algorithm, stepping between successive functions g. To see this result, first observe that, by definition, g(x ^((m+1)) ,x ^((m)))≦g(x ^((m)) ,x ^((m))).  (21) Now from (18) and (19), it follows that ƒ(x ^((m+1)))≦g(x ^((m+1)) ,x ^((m)))≦g(x ^((m)) ,x ^((m)))=ƒ(x ^((m))).  (22) In other words and as illustrated in FIG. 5, the function g(x, x^((n))) intersects with the function ƒ(x) at x^((n)) and also has a further minimum at point x^((n+1)). That further minimum point is used in the next iteration as a new g(x^((n))) from which a new minimum at a new x^((n+1)) may be determined.

Maximum A Posteriori Estimation: MAP Objective Function

There are many estimation methods that could be used to estimate the reflectance coefficients. The described approach uses the MAP method because it allows for the incorporation of a priori information in a relatively straightforward manner. Let Y and X represent the random vectors underlying the data y and reflectance coefficient vector x, respectively. From the white Gaussian noise assumption, it follows from (16) that the conditional density function of Y given that X=x is a multivariate Gaussian density function with mean E[Y]=Ax and covariance matrix C=σ²I_(K), where I_(K) is the K×K identity matrix. The a posteriori density function of X given that Y=y can be expressed as

$\begin{matrix} {{f_{X❘Y}\left( {x❘y} \right)} = \frac{{f_{Y❘X}\left( {y❘x} \right)}{f_{X}(x)}}{f_{Y}(y)}} & (23) \end{matrix}$ where ƒ_(X)(x) and ƒ_(Y)(y) are the joint density functions of the reflectance coefficients and data, respectively. Under the additional assumption that the reflectance coefficients are independent and identically distributed, the MAP estimate is given by

$\begin{matrix} {\begin{matrix} {{\hat{x}}_{MAP} = {\arg\;{\max\limits_{x}\mspace{14mu}{f_{x❘y}\left( {x❘y} \right)}}}} \\ {= {{\arg\;{\min\limits_{x}{- {\log\left\lbrack {\frac{1}{\left( {\left( {2\;\pi} \right)^{K}{C}} \right)^{\frac{1}{2}}}{\exp\left( {{- \frac{1}{2}}\left( {y - {Ax}} \right)^{T}{C^{- 1}\left( {y - {Ax}} \right)}} \right)}} \right\rbrack}}}} - (25)}} \\ {\log\;{f_{x}(x)}} \\ {= {{\arg\;{\min\limits_{x}\mspace{14mu}{\frac{K}{2}{\log\left( \sigma^{2} \right)}}}} + {\frac{1}{2\;\sigma^{2}}{\phi_{1}(x)}} + {{\phi_{2}(x)}(26)}}} \end{matrix}\mspace{20mu}{where}} & (24) \\ {\mspace{79mu}{{\phi_{1}(x)}\overset{\bigtriangleup}{=}{{y - {Ax}}}_{2}^{2}}} & (27) \\ {\mspace{79mu}{{\phi_{2}(x)}\overset{\bigtriangleup}{=}{{- \log}{\prod\limits_{l = 1}^{L}\;{f\left( x_{l} \right)}}}}} & (28) \end{matrix}$ and ƒ is the chosen prior density function for the reflectance coefficients. In the following sections, several density functions are described that can provide reasonable models for the prior distribution of the reflectance coefficients.

First Prior Probability Function:

Joint Probability Density Function of Reflectance Coefficients

To promote sparsity, the density function ƒ should be an even function with long tails, and have the property that values near zero occur with high probability. The first approach follows

$\begin{matrix} {{f\left( x_{l} \right)}\overset{\bigtriangleup}{=}\frac{k(\lambda)}{1 + {\exp\left( {\lambda{x}} \right)}}} & (29) \end{matrix}$ where the constant λ controls the degree of sparsity and k(λ) is a normalizing constant. The value of the normalizing constant was determined to be

${k(\lambda)} = {\frac{\lambda}{\log\; 4}.}$

Proposed MAP Algorithm for First Prior Probability Function

We define φ to be the MAP objective function and express this function as (with the first two terms being the data component and the last term representing the prior probability density function)

$\begin{matrix} {{{\phi(x)} = {{\frac{1}{2\;\sigma^{2}}{\phi_{1}(x)}} + {\frac{K}{2}\log\;\sigma^{2}} + {\sum\limits_{l = 1}^{L}\;{s\left( x_{l} \right)}}}}{where}} & (30) \\ {{\phi_{1}(x)}\overset{\bigtriangleup}{=}{{y - {Ax}}}_{2}^{2}} & (31) \\ {{s(a)}\overset{\bigtriangleup}{=}{{{- \log}\;{f(a)}} = {{- \log}\frac{\frac{\lambda}{\log\; 4}}{1 + {\exp\left( {\lambda{a}} \right)}}}}} & (32) \end{matrix}$ From (46), it follows that to minimize φ we must find majorizing functions for both φ₁ and s.

De Pierro showed in “A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography,” Medical Imaging Processing, IEEE Transactions on, vol. 14, no. 1, pp. 132-137, 1995, and incorporated by reference, that a majorizing function for φ₁ at the current iterate x^((m)) is given by

$\begin{matrix} {{q_{1}\left( {x,x^{(m)}} \right)} = {\sum\limits_{k = 1}^{K}\;\left( {y_{k}^{2} - {2\;{y_{k}\lbrack{Ax}\rbrack}_{k}} + {\sum\limits_{l = 1}^{L}\;{r\left( {x,x^{(m)}} \right)}}} \right)}} & (33) \end{matrix}$ where

${{r\left( {x,x^{(m)}} \right)}\overset{\bigtriangleup}{=}{c_{kl}\left( {{n_{k}A_{kl}x_{l}} - {n_{k}A_{kl}x_{l}^{(m)}} + \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}^{2}},{\lbrack{Ax}\rbrack_{k}\overset{\bigtriangleup}{=}{\sum\limits_{l = 1}^{L}\;{A_{kl}x_{l}}}}$ is the k^(th) component of the vector Ax, n_(k) is the number of non-zero elements in the k^(th) row of A, and

$\begin{matrix} {c_{kl}\overset{\Delta}{=}\left\{ \begin{matrix} {n_{k}^{- 1},} & {A_{kl} \neq 0} \\ {0,} & {A_{kl} = 0} \end{matrix} \right.} & (34) \end{matrix}$

Using the theorem 4.5 in J. de Leeuw and K. Lange, “Sharp quadratic majorization in one dimension,” Computational statistics and data analysis, vol. 53, no. 7, pp. 2471-2484, 2009, which is incorporated by reference herein, the best quadratic majorizer for the function s at the point b is given by

$\begin{matrix} {{q_{2}\left( {a,b} \right)} = {{\frac{s^{\prime}(b)}{2\; b}\left( {a^{2} - b^{2}} \right)} + {s(b)}}} & (35) \end{matrix}$ where the derivative of s is equal to

$\begin{matrix} {{s^{\prime}(a)} = \frac{\lambda\; a\;{\exp\left( {\lambda{a}} \right)}}{{a}\left\lbrack {1 + {\exp\left( {\lambda{a}} \right)}} \right\rbrack}} & (36) \end{matrix}$ Thus, a majorizing function for the MAP objective function φ is obtained by substituting q₁ for φ₁ and q₂ for s in (46)

$\begin{matrix} {{Q\left( {x,x^{m}} \right)} = {{\frac{1}{2\sigma^{2}}{q_{1}\left( {x,x^{(m)}} \right)}} + {\frac{K}{2}\log\;\sigma^{2}} + {\sum\limits_{l = 1}^{L}\;\left\lbrack {{\frac{s^{\prime}\left( x_{l}^{(m)} \right)}{2\; x_{l}^{(m)}}\left( {x_{l}^{2} - x_{l}^{2{(m)}}} \right)} + {s\left( x_{l}^{(m)} \right)}} \right\rbrack}}} & (37) \end{matrix}$

Taking the derivative of the majorizing function Q with respect to x_(l) and setting the result to zero yields the desired update for the l^(th) reflectance coefficient, l=1, 2, . . . , L

$\begin{matrix} {x_{l}^{({m + 1})} = \frac{{\sum\limits_{k = 1}^{K}\;{A_{kl}\left( {y_{k} - {\left\lbrack {Ax}^{(m)} \right\rbrack k}} \right)}} + {\sum\limits_{k = 1}^{K}\;{n_{k}A_{kl}^{2}x_{l}^{(m)}}}}{{\sum\limits_{k = 1}^{K}\;{n_{k}A_{kl}^{2}}} + \frac{\sigma^{2}{s^{\prime}\left( x_{l}^{(m)} \right)}}{x_{l}^{(m)}}}} & (38) \end{matrix}$

Estimation of σ² and λ for First Prior Density Function

One possible choice for the initial reflectance vector is the estimate obtained from the DAS algorithm, which we denote by x_(DAS). However, the quantity [Ax_(DAS)]_(k), which is an estimate of the

noise-free data point, is typically much greater than the k^(th) observed data point, y_(k). Therefore, we use x⁽⁰⁾

ĉx_(DAS) as the initial estimate where

$\begin{matrix} {\hat{c} = {{\arg{\min\limits_{c}{{y - {cAx}_{DAS}}}_{2}^{2}}} = \frac{\sum\limits_{k = 1}^{K}\;{y_{k}\left\lbrack {Ax}_{DAS} \right\rbrack}_{k}}{\sum\limits_{k = 1}^{K}\;\left\lbrack {Ax}_{DAS} \right\rbrack_{k}^{2}}}} & (39) \end{matrix}$

The parameters σ² and λ are chosen to be

$\begin{matrix} {{\hat{\sigma}}^{2} = {{\arg{\min\limits_{\sigma^{2}}{\phi\left( {\sigma^{2},x^{(0)}} \right)}}} = \frac{{{y - {Ax}^{(0)}}}_{2}^{2}}{K}}} & (40) \\ {\hat{\lambda} = {\arg{\min\limits_{\lambda}{\phi\left( {\lambda,x^{(0)}} \right)}}}} & (41) \end{matrix}$ where the optimization problem in (41) is solved using a 1D line search. With these choices, σ² and λ in (38) are replaced by {circumflex over (σ)}² and {circumflex over (λ)} to estimate the reflectance coefficients in this approach.

Second and Third Prior Probability Functions:

Jeffreys' Non-Informative Prior and Laplacian-Like Prior

The second approach for the prior probability density function is the Jeffreys' non-informative prior

$\begin{matrix} {{f(a)}\overset{\Delta}{=}{\prod\limits_{l = 1}^{L}\;\frac{1}{a}}} & (42) \end{matrix}$ which is an improper prior because its area is infinite. This prior is in fact an amplitude-scaled invariance prior, which means that the units in which a quantity is measured do not influence the resulting conclusion. It turns out that the Jeffreys' prior is an extremely heavy-tailed density function and therefore is able to enforce the sparsity assumption. The Jeffreys' prior is parameter-free and thus may be suitable for a wide-range of applications.

We refer to the third approach as the Laplacian-like prior, which follows

$\begin{matrix} {{f\left( {a;\lambda} \right)}\overset{\Delta}{=}\frac{k(\lambda)}{1 + {\exp\left( {\lambda{a}} \right)}}} & (43) \end{matrix}$ where the constant λ controls the degree of sparsity and k(λ) is a normalizing constant. The value of the normalizing constant was determined to be

${k(\lambda)} = \frac{\lambda}{\log\; 4}$ by solving the equation

$\begin{matrix} {{\int_{- \infty}^{\infty}{\frac{k(\lambda)}{1 + {\exp\left( {\lambda{a}} \right)}}\ d\; a}} = 1} & (44) \end{matrix}$ Hence, the proposed sparsity inducing probability density function can be expressed as

$\begin{matrix} {{f\left( {a;\lambda} \right)} = \frac{\frac{\lambda}{\log\; 4}}{1 + {\exp\left( {\lambda{a}} \right)}}} & (45) \end{matrix}$ FIG. 6 illustrates plots of the Laplacian-like prior density function 608 and the Laplacian probability density function 612 for the purpose of making a comparison.

GMR Algorithm for Jeffreys' and Laplacian-Like Priors

In this section, we present the MM based GMR algorithm, which reconstructs GPR images by iteratively minimizing the negative MAP objective function, for the Jeffreys' and Laplacian-like priors. First, we consider the case where the noise variance, σ², and prior parameter, θ, are known. Then we consider the case where both of these quantities are unknown.

Where the noise variance and PDF parameter are known, it will be convenient to define a function φ to be the objective function on the right hand side of (26) and express this function as

$\begin{matrix} {{{\phi(x)} = {{\frac{1}{2\sigma^{2}}{\phi_{1}(x)}} + {\frac{K}{2}{\log\left( \sigma^{2} \right)}} + {\sum\limits_{l = 1}^{L}\;{s\left( {x_{l};\theta} \right)}}}}{where}} & (46) \\ {{\phi_{1}(x)}\overset{\Delta}{=}{{y - {Ax}}}_{2}^{2}} & (47) \\ {{s\left( {a;\theta} \right)}\overset{\Delta}{=}{{- \log}\;{f\left( {a;\theta} \right)}}} & (48) \end{matrix}$ For convenience, we will refer to the function s(•; θ) as the negative log prior.

From (46), it follows that to minimize the negative MAP objective function φ using the MM technique we must find majorizing functions for both φ₁ and s. In the context of reconstructing positron emission tomography images, De Pierro developed a majorizing function for linear least-squares objective functions (see A. R. De Pierro, “A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography” IEEE transactions, medical imagery pp 132-137, 1995, which is incorporated by reference herein). Using his result, a majorizing function for φ₁ at the current iterate x^((m)) is given by

$\begin{matrix} {{{q_{1}\left( {x;x^{(m)}} \right)} = {\sum\limits_{k = 1}^{K}\;\left( {y_{k}^{2} - {2\;{y_{k}\lbrack{Ax}\rbrack}_{k}} + {\sum\limits_{l = 1}^{L}\;{r\left( {x_{l},x^{(m)}} \right)}}} \right)}}{where}} & (49) \\ {{r\left( {x_{l},x^{(m)}} \right)}\overset{\Delta}{=}{c_{kl}\left( {{n_{k}A_{kl}x_{l}} - {n_{k}A_{kl}x_{l}^{(m)}} + \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}^{2}} & (50) \\ {n_{k}\overset{\Delta}{=}{{number}\mspace{14mu}{of}\mspace{14mu}{non}\text{-}{zero}\mspace{14mu}{elements}\mspace{14mu}{in}\mspace{14mu}{the}\mspace{14mu} k^{th}\mspace{14mu}{row}\mspace{14mu}{of}\mspace{14mu} A}} & (51) \\ {c_{kl}\overset{\Delta}{=}\left\{ {\begin{matrix} {n_{k}^{- 1},} & {A_{kl} \neq 0} \\ {0,} & {A_{kl} = 0} \end{matrix},} \right.} & (52) \\ {\lbrack{Ax}\rbrack_{k}\overset{\Delta}{=}{\sum\limits_{l = 1}^{L}\;{A_{kl}x_{l}}}} & (53) \end{matrix}$

Now, we address the problem of determining a majorizing function for the function s(•;θ) by exploiting the following theorem by de Leeuw and Lang (J. de Leeuw and K. Lange, “Sharp Quadratic Majorization in One Dimension” Computational Statistics and Data Analysis vol. 53 no. 1 pp 2478 February 2004, which is incorporated by reference herein) Suppose d(a) is an even, differentiable function on R such that the ratio d′(a)/a is decreasing on (0, ∞). Then, the following function

$\begin{matrix} {{z\left( {a,b} \right)}\overset{\Delta}{=}{{\frac{d^{\prime}(b)}{2\; b}\left( {a^{2} - b^{2}} \right)} + {d(b)}}} & (54) \end{matrix}$ is the best quadratic majorizing function for d at the point b. Assuming that the negative log prior satisfies the conditions of de Leeuw and Lang's theorem, then a majorizing function for this function at the point b is

$\begin{matrix} {{z\left( {a,{b;\theta}} \right)}\overset{\Delta}{=}{{\frac{s^{\prime}\left( {b;\theta} \right)}{2\; b}\left( {a^{2} - b^{2}} \right)} + {s\left( {b;\theta} \right)}}} & (55) \end{matrix}$ In turn, it follows that a majorizing function for the term

$\sum\limits_{l = 1}^{L}\;{s\left( {x_{l};\theta} \right)}$ in (46) about the current iterate x^((m)) is

$\begin{matrix} {\sum\limits_{l = 1}^{L}\;\left\lbrack {{\frac{s^{\prime}\left( {x_{l}^{(m)};\theta} \right)}{2\; x_{l}^{(m)}}\left( {x_{l}^{2} - x_{l}^{2{(m)}}} \right)} + {s\left( {x_{l}^{(m)};\theta} \right)}} \right\rbrack} & (56) \end{matrix}$

Given the results in (49) and (56), we can conclude that

$\begin{matrix} {{Q\left( {x,x^{(m)}} \right)} = {{\frac{1}{2\sigma^{2}}{q_{1}\left( {x,x^{(m)}} \right)}} + {\frac{K}{2}\log\;\sigma^{2}} + {\sum\limits_{l = 1}^{L}\;\left\lbrack {{\frac{s^{\prime}\left( {x_{l}^{(m)};\theta} \right)}{2\; x_{l}^{(m)}}\left( {x_{l}^{2} - x_{l}^{2{(m)}}} \right)} + {s\left( {x_{l}^{(m)};\theta} \right)}} \right\rbrack}}} & (57) \end{matrix}$ is a majorizing function for the negative MAP objective function φ about the current iterate x^((m)). The proposed GMR algorithm of this approach follows by taking the derivative of Q with respect to x_(t) and setting the result to zero. From straightforward calculations

$\begin{matrix} {\frac{\partial{Q\left( {x,x^{(m)}} \right)}}{\partial x_{t}} = {{\frac{1}{\sigma^{2}}\left\lbrack {{- {\sum\limits_{k = 1}^{K}\;{A_{kt}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}} + {\left( {x_{t} - x_{t}^{(m)}} \right){\sum\limits_{k = 1}^{K}\;{n_{k}A_{kt}^{2}}}}} \right\rbrack} + {x_{t}\frac{s^{\prime}\left( {x_{t}^{(m)};\theta} \right)}{x_{t}^{(m)}}}}} & (58) \end{matrix}$ Equating the derivative in (58) to zero yields the desired update

$\begin{matrix} {x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + \frac{\sigma^{2}{s^{\prime}\left( {x_{l}^{(m)};\theta} \right)}}{x_{l}^{(m)}}}} & (59) \end{matrix}$ where G_(l) ^((m)) and H_(l) are defined as,

$G_{l}^{(m)}\overset{\Delta}{=}{\sum\limits_{k = 1}^{K}{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}$

$\begin{matrix} {G_{l}^{(m)}\overset{\Delta}{=}{\sum\limits_{k = 1}^{K}\;{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}} & (60) \\ {H_{l}\overset{\Delta}{=}{\sum\limits_{k = 1}^{K}\;{n_{k}A_{kl}^{2}}}} & (61) \end{matrix}$

A careful observation of (59) reveals that the process of estimating the individual reflection coefficients is decoupled in the sense that, for any voxel, the computation of the next estimate depends only on the previous estimate. As a result, this algorithm can be easily parallelized to enhance its computational speed. It should also be mentioned that a fast memory efficient method for computing (60) and (61) has been developed (see U.S. patent application Ser. No. 14/184,446 filed Feb. 19, 2014, published as U.S. Pat. App. Pub. No. 2015/0279082, which is incorporated by reference herein) and which fast methods are described below. Therefore, this GMR algorithm is expected to be applicable to real-world GPR imaging problems with high dimensionality data.

For the Jeffreys' prior, we have that

$\begin{matrix} {{s(a)} = {{- \log}\frac{1}{a}}} & (62) \\ {\frac{s^{\prime}(a)}{a} = \frac{1}{a^{2}}} & (63) \end{matrix}$ Because the conditions of de Leeuw and Lange's theorem are met, the GMR algorithm for the Jeffreys' prior is

$\begin{matrix} {x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2}\frac{1}{\left( x_{l}^{(m)} \right)^{2}}}}} & (64) \end{matrix}$

For the Laplacian-like prior, the negative log prior s(a;λ) and function

$\frac{s^{\prime}\left( {a;\lambda} \right)}{a}$ follow

$\begin{matrix} {{s\left( {a;\lambda} \right)} = {{- \log}\frac{\frac{\lambda}{\log\; 4}}{1 + {\exp\left( {\lambda{a}} \right)}}}} & (65) \\ {\frac{s^{\prime}\left( {a;\lambda} \right)}{a} = \frac{{\lambda exp}\left( {\lambda{a}} \right)}{{a}\left( {1 + {\exp\left( {\lambda{a}} \right)}} \right)}} & (66) \end{matrix}$ Like the Jeffreys' prior, the Laplacian-like prior satisfies the conditions of de Leeuw and Lange's theorem so the corresponding GMR algorithm is

$\begin{matrix} {x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2}\frac{{\lambda exp}\left( {\lambda{x_{l}^{(m)}}} \right)}{{x_{l}^{(m)}}\left( {1 + {\exp\left( {\lambda{x_{l}^{(m)}}} \right)}} \right)}}}} & (67) \end{matrix}$

Similarly, as an alternative approach, the Laplacian prior can be used as well. In this case, the negative log prior s(a;λ) and function

$\frac{s^{\prime}\left( {a;\lambda} \right)}{a}$ for the Laplacian prior follow

$\begin{matrix} {{s\left( {a;\lambda} \right)} = {{- \log}\frac{\lambda}{2}{\exp\left( {{- \lambda}{a}} \right)}}} & (68) \\ {\frac{s^{\prime}\left( {a;\lambda} \right)}{a} = \frac{\lambda}{a}} & (69) \end{matrix}$ Like the Jeffreys' prior and Laplacian-like prior, the Laplacian prior satisfies the conditions of de Leeuw and Lange's theorem so the corresponding GMR algorithm is

$\begin{matrix} {x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2}\frac{\lambda}{x_{l}^{(m)}}}}} & (70) \end{matrix}$

Previously, we assumed that the noise variance, σ², and prior parameter, θ, were known. However, oftentimes in practice these quantities are unknown. We present in this section a procedure to jointly estimate the reflection coefficient vector, noise power and prior parameter. For simplicity, we will consider the Laplacian-like prior parameter.

First, we modify the notation to account for the fact that the negative MAP objective function, φ, and majorizing function, Q, now explicitly depend on λ and σ². Thus, φ(x) and Q(x; x^((m))) are now expressed as φ(x, λ, σ²) and Q(x, λ, σ²; x^((m))), respectively The cyclic optimization method given below could in principle be used to jointly estimate the reflectance coefficient vector, x, noise variance, σ², and Laplacian-like prior parameter, λ:

-   -   for m=0, 1, . . . , niter do

$\begin{matrix} {x^{({m + 1})} = {\arg{\min\limits_{x}{\phi\left( {x,\lambda^{(m)},\sigma^{2{(m)}}} \right)}}}} & (71) \\ {\lambda^{({m + 1})} = {\arg{\min\limits_{\lambda > 0}{\phi\left( {x^{({m + 1})},\lambda,\sigma^{2{(m)}}} \right)}}}} & (72) \\ {\sigma^{2{({m + 1})}} = {\arg{\min\limits_{\sigma^{2} > 0}{\phi\left( {x^{({m + 1})},\lambda^{({m + 1})},\sigma^{2}} \right)}}}} & (73) \end{matrix}$

-   -   end for         where λ^((m)) and σ^(2(m)) are the m^(th) iterates of λ and σ²,         respectively.

Instead of solving the minimization problem in (71) it will be convenient to find an iterate x^((m+1)) that decreases the negative MAP objective function φ(x, λ, σ²) in the following sense φ(x ^((m+1)),λ^((m)),σ^(2(m)))≦φ(x ^((m)),λ^((m)),σ^(2(m)))  (74) Taking into account the discussion above regarding noise variance and PDF parameters, an iterate that satisfies (74) can be found by minimizing the majorizing function Q(x, λ^((m)), σ^(2(m)); x^((m))) (i.e., (57) with θ and σ² replaced with λ^((m)) and σ^(2(m)), respectively. Thus, a solution to (71) is

$\begin{matrix} {x^{({m + 1})}\overset{\Delta}{=}{\arg{\min\limits_{x}{Q\left( {x,\lambda^{(m)},{\sigma^{2{(m)}};x^{(m)}}} \right)}}}} & (75) \end{matrix}$ The minimization problem in (72) can be solved using a one-dimensional line search such as the golden section search method. Implicitly, we have assumed that an interval can be determined that contains the minimizer of φ(x^((m+1)), λ, σ^(2(m))). Finally, we estimate the noise power by obtaining a closed form expression for the minimizer of φ(x^((m+1)), λ^((m+1)), σ²). Putting the above ideas altogether, an alternative to the algorithm defined by (71)-(73) is summarized below

-   -   for m=0, 1, . . . , niter do

$\begin{matrix} {x^{({m + 1})} = {\arg{\min\limits_{x}{Q\left( {x,\lambda^{(m)},{\sigma^{2{(m)}};x^{(m)}}} \right)}}}} & (76) \\ \begin{matrix} {x^{({m + 1})} = {\arg{\min\limits_{\lambda_{l} \leq \lambda \leq \lambda_{k}}{\phi\left( {x^{({m + 1})},\lambda,\sigma^{2{(\min)}}} \right)}}}} \\ {= {\arg{\min\limits_{\lambda_{l} \leq \lambda \leq \lambda_{k}}{\sum\limits_{l = 1}^{L}\;{{- \log}\;{f\left( {x_{l}^{({m + 1})};\lambda} \right)}(78)}}}}} \end{matrix} & (77) \\ \begin{matrix} {{\sigma^{2}}^{({m + 1})} = {\arg{\min\limits_{\sigma^{2}}{\phi\left( {x^{({m + 1})},\lambda^{({m + 1})},\sigma^{2}} \right)}}}} \\ {{\arg{\min\limits_{\sigma^{2}}{\frac{K}{2}\log\;\sigma^{2}}}} + {\frac{1}{2\sigma^{2}}{\phi_{1}\left( x^{({m + 1})} \right)}(80)}} \end{matrix} & (79) \end{matrix}$

-   -   end for

We will now show that the cyclic minimization algorithm defined by (76)-(80) is guaranteed to monotonically decrease the MAP objective function φ(x, λ, σ²). From (76), it follows that φ(x^((m)), λ^((m)), σ^(2(m)))≧φ(x^((m+1)), λ^((m)), σ^(2(m))). Also (78) implies that φ(x^((m+1)), λ^((m)), σ^(2(m)))≧φ(x^((m+1)), λ^((m+1)), σ^(2(m))). Finally, it can be observed from (80) that φ(x^((m+1)), λ^((m+1)), σ^(2(m)))≧φ(x^((m+1)), λ^((m+1)), σ^(2(m+1))). Therefore, φ(x ^((m)),λ^((m)),σ^(2(m)))≧φ(x ^((m+1)),λ^((m+1)),σ^(2(m+1)))  (81) Thus, the iterates obtained from the cyclic minimization technique described above are guaranteed to monotonically decrease the objective function φ(x, λ, σ²).

The solution to (76) was discussed above and is given by (67) with λ and σ² replaced by λ^((m)) and σ^(2(m)), respectively

$\begin{matrix} {x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2{(m)}}\frac{\lambda^{(m)}{\exp\left( {\lambda^{(m)}{x_{l}^{(m)}}} \right)}}{{x_{l}^{(m)}}\left( {1 + {\exp\left( {\lambda^{(m)}{x_{l}^{(m)}}} \right)}} \right.}}}} & (82) \end{matrix}$ To obtain the estimate for the noise power, we set the derivative of φ(x^((m+1)), λ^((m+1)), σ²) with respect to σ² to zero and solve the resulting equation. The derivative of φ(x^((m+1)), λ^((m+1)), σ²) with respect to σ² equals

$\begin{matrix} {\frac{\partial{\phi\left( {x^{({m + 1})},\lambda^{({m + 1})},\sigma^{2}} \right)}}{\partial\sigma^{2}} = {\frac{K}{2\sigma^{2}} - {\frac{1}{2\left( \sigma^{2} \right)^{2}}{{y - {Ax}^{({m + 1})}}}_{2}^{2}}}} & (83) \end{matrix}$ Setting this derivative to zero yields for this approach the maximum likelihood estimate of the noise power

$\begin{matrix} {\sigma^{2{({m + 1})}} = {\frac{1}{K}{{y - {Ax}^{({m + 1})}}}_{2}^{2}}} & (84) \end{matrix}$

Now, given our discussion above, it follows that for the case of the Laplacian prior that the update for the reflection coefficient vector is equal to

$\begin{matrix} {x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2{(m)}}\frac{\lambda(m)}{x_{l}^{(m)}}}}} & (85) \end{matrix}$ Recall the following well known result: Given n observations w₁, w₂, . . . , w_(n) of a Laplacian probability density function with unknown parameter μ, the maximum likelihood estimate of μ is

$\hat{\mu} = {\frac{1}{n}{\sum\limits_{r = 1}^{n}\;{{w_{r}}.}}}$ From this result, we can conclude that

$\begin{matrix} {\lambda^{({m + 1})} = {\frac{1}{L}{\sum\limits_{l = 1}^{L}\;{x_{l}^{({m + 1})}}}}} & (86) \end{matrix}$ is the solution to (78) for the Laplacian prior. As in the Laplacian-like prior case, the update for the noise variance is given by (84). Finally, because the Jeffreys' prior does not depend on a parameter, the update for x is given by (64) with σ² replaced by σ^(2(m)) in (84).

Fourth Prior Probability Function: Butterworth Prior

In this approach, we present a novel density function, known as the Butterworth density function, which can be used to incorporate the sparsity assumption. The Butterworth density function is defined to be

$\begin{matrix} {{{f_{BW}\left( {a;\varepsilon} \right)}\overset{\Delta}{=}\frac{k_{n}(\varepsilon)}{1 + \left( \frac{a}{\varepsilon} \right)^{2\; n}}}{where}} & (87) \\ {{k_{n}(\varepsilon)}\overset{\Delta}{=}\left\lbrack {\int_{- \infty}^{\infty}{\frac{1}{1 + \left( \frac{a}{\varepsilon} \right)^{2\; n}}\ d\; a}} \right\rbrack^{- 1}} & (88) \end{matrix}$ is a normalization factor, and ∈ and n are parameters that control the width and decay rate of the density function, respectively. FIG. 7 illustrates a plot of the Butterworth density function for ∈=0.1 and n=3. One should observe that for large n the Butterworth density function approximates the uniform distribution and

${k_{n}(\varepsilon)} \approx {\frac{1}{2\varepsilon}.}$ It should also be noted that we fix the parameter n for our application so we do not explicitly include it in the notation for the Butterworth density function. With the Butterworth density function, the negative log prior becomes

$\begin{matrix} {{\phi_{2}(x)}\overset{\Delta}{=}{{- \log}{\prod\limits_{l = 1}^{L}\;{f_{BW}\left( {x_{l};\varepsilon} \right)}}}} & (89) \end{matrix}$

MAP Approach for Butterworth Prior

A MAP algorithm using the Butterworth density function as the prior density function is developed as follows.

The MAP algorithms we have developed are based on the majorize-minimize (MM) methodology so we now introduce this concept. An MM algorithm reduces an objective function by iteratively minimizing a carefully chosen majorizing function. Let ƒ be a function to be minimized within some domain D ⊂

^(L). A real valued function g is said to be a majorizing function for ƒ at the point y ∈ D if g(x,y)≧ƒ(x),∀x,y∈D  (90) g(y,y)=ƒ(y),∀y∈D  (91) If we succeed in obtaining a majorizing function for ƒ, then the associated MM algorithm for minimizing ƒ is given by

$\begin{matrix} {x^{({m + 1})} = {\arg{\min\limits_{x \in D}{{g\left( {x,x^{(m)}} \right)}.}}}} & (92) \end{matrix}$ From (90) and (91), it follows that ƒ(x ^((m+1)))≦g(x ^((m+1)) ,x ^((m)))≦g(x ^((m)) ,x ^((m)))=ƒ(x ^((m)))  (93) where the second inequality follows from (92). This implies that the iterates monotonically decrease the function ƒ as the iteration number increases.

Recall the negative MAP objective function is given by

$\begin{matrix} {{\phi(x)}\overset{\Delta}{=}{{\frac{K}{2}{\log\left( \sigma^{2} \right)}} + {\frac{1}{2\sigma^{2}}{\phi_{1}(x)}} + {\phi_{2}(x)}}} & (94) \end{matrix}$ Note that if we find functions q₁(•; y) and q₂(•; y) that are majorizing functions for φ₁ and φ₂, respectively, where y ∈

^(L), then

$\begin{matrix} {{q\left( {x;y} \right)}\overset{\Delta}{=}{{\frac{K}{2}{\log\left( \sigma^{2} \right)}} + {\frac{1}{2\sigma^{2}}{q_{1}\left( {x;y} \right)}} + {q_{2}\left( {x;y} \right)}}} & (95) \end{matrix}$ is a majorizing function for φ about the point y. In this case, the associated MM algorithm is given by

$\begin{matrix} {x^{({m + 1})} = {\arg\;{\min\limits_{x}\;{q\left( {x;x^{(m)}} \right)}}}} & (96) \end{matrix}$

Now we must determine majorizing functions q₁(•; y) and q₂(•; y). As noted above, De Pierro constructed a majorizing function for linear least-squares objective functions such as φ₁. We now summarize his result for the Butterworth prior. First, we rewrite the least-squares objective function as

$\begin{matrix} {{\phi_{1}(x)} = {\sum\limits_{k = 1}^{K}\;\left( {y_{k}^{2} - {{2\lbrack{Ax}\rbrack}{{}_{}^{}{}_{}^{}}} + \left( \lbrack{Ax}\rbrack_{k} \right)^{2}} \right)}} & (97) \end{matrix}$ where y_(k) is the k^(th) component of y. We can obtain a majorizing function for φ₁ by first finding a majorizing function for ([Ax]_(k))², where [Ax]_(k) is the k^(th) element of the vector Ax. The term ([Ax]_(k))² can be written as

$\begin{matrix} {\begin{matrix} {\left( \lbrack{Ax}\rbrack_{k} \right)^{2}\overset{\bigtriangleup}{=}\left( {\sum\limits_{l = 1}^{L}\;{A_{kl}x_{l}}} \right)^{2}} \\ {= \left( {\sum\limits_{l = 1}^{L}\;{\lambda_{kl}\left( {{\frac{1}{\lambda_{kl}}A_{kl}x_{l}} - {\frac{1}{\lambda_{kl}}A_{kl}x_{l}^{(m)}} + \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}} \right)^{2}} \end{matrix}{where}} & (98) \\ {\lambda_{kl}\overset{\bigtriangleup}{=}\left\{ \begin{matrix} \frac{1}{n_{k}} & {{{{for}\mspace{14mu}{all}\mspace{14mu} A_{kl}} \neq 0};} \\ 0 & {o.w.} \end{matrix} \right.} & (99) \\ {n_{k}\overset{\bigtriangleup}{=}{{number}\mspace{14mu}{of}\mspace{14mu}{nonzero}\mspace{14mu}{elements}\mspace{14mu}{in}\mspace{14mu}{row}\mspace{14mu} k}} & (100) \end{matrix}$ Note that

${\sum\limits_{l = 1}^{L}\;\lambda_{kl}} = 1$ for all k and λ_(kl)≧0 for all k, l. Therefore, by convexity of the square function,

$\begin{matrix} {{\left( \lbrack{Ax}\rbrack_{k} \right)^{2} \leq {r\left( {x;x^{(m)}} \right)}}{where}} & (101) \\ {{r\left( {x;x^{(m)}} \right)}\overset{\bigtriangleup}{=}{\sum\limits_{l = 1}^{L}\;{\lambda_{kl}\left( {{\frac{1}{\lambda_{kl}}A_{kl}x_{l}} - {\frac{1}{\lambda_{kl}}A_{kl}x_{l}^{(m)}} + \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}^{2}}} & (102) \end{matrix}$ It is easy to see that r(x^((m)); x^((m)))=([Ax^((m))]_(k))². Thus, r(•; x^((m))) is a majorizing function for ([Ax]_(k))². If we replace ([Ax]_(k))² by r(x; x^((m))) in (97), we obtain the desired majorizing function for the least-squares objective function

$\begin{matrix} {{q_{1}\left( {x;x^{(m)}} \right)}\overset{\bigtriangleup}{=}{\sum\limits_{k = 1}^{K}\;\left( {y_{k}^{2} - {{2\lbrack{Ax}\rbrack}{{}_{}^{}{}_{}^{}}} + {r\left( {x;x^{(m)}} \right)}} \right)}} & (103) \end{matrix}$

We now turn to the problem of finding a majorizing function for φ₂, which we now write as

$\begin{matrix} {{\phi_{2}(x)}\overset{\bigtriangleup}{=}{{{- \log}{\prod\limits_{l = 1}^{L}\;{f_{BW}\left( {x_{l};\varepsilon} \right)}}} = {\sum\limits_{l = 1}^{L}\;{g\left( {x_{l};\varepsilon} \right)}}}} & (104) \end{matrix}$ where g(a; ∈)

−log ƒ_(BW)(a; ∈). As discussed above, de Leeuw and Lange showed that for a univariate function p, the function on

²

$\begin{matrix} {{h\left( {x,y} \right)}\overset{\bigtriangleup}{=}{{p(y)} + {{p^{\prime}(y)}\left( {x - y} \right)} + {\frac{C}{2}\left( {x - y} \right)^{2}}}} & (105) \end{matrix}$ majorizes p at y, provided p is twice differentiable and C>0 is an upper bound for p″. We now use this “Taylor series” expansion method to obtain a majorizing function for φ₂. Replacing the function g in (104) by its Taylor series majorizing function results in the following majorizing function for φ₂ at x^((m))

$\begin{matrix} {{q_{2}\left( {x;x^{(m)}} \right)}\overset{\bigtriangleup}{=}{\sum\limits_{l = 1}^{L}\;\left\lbrack {{g\left( {x_{l}^{(m)};\varepsilon} \right)} + {{g^{\prime}\left( {x_{l}^{(m)};\varepsilon} \right)}\left( {x_{l} - x_{l}^{(m)}} \right)} + {\frac{B}{2}\left( {x_{l} - x_{l}^{(m)}} \right)^{2}}} \right\rbrack}} & (106) \end{matrix}$ where, in this case, B is the least upper bound of the second derivative of g(•; ∈). This choice for B guarantees the optimality of the majorizing function q₂. Observe that q₂(x; x^((m)))=φ₂(x^((m))) when x=x^((m)).

From (95), (103), and (106), we have the following majorizing function for the MAP objective function

$\begin{matrix} {{q\left( {x;x^{(m)}} \right)} = {{\frac{K}{2}{\log\left( \sigma^{2} \right)}} + {\frac{1}{2\;\sigma^{2}}{\sum\limits_{k = 1}^{K}\;\left( {y_{k}^{2} - {2\;{y_{k}\lbrack{Ak}\rbrack}_{k}} + {r\left( {x;x^{(m)}} \right)}} \right)}} + {q_{2}\left( {x;x^{(m)}} \right)}}} & (107) \end{matrix}$ Recall that we can obtain an algorithm for minimizing the MAP objective function using (92). Taking the derivative of q(x; x^((m))) with respect to x_(l) and setting the result to zero produces the desired MAP algorithm. The derivative of q(x; x^((m))) with respect to x_(l) equals

$\begin{matrix} {{\frac{\partial{q\left( {x;x^{(m)}} \right)}}{\partial x_{l}} = {{{- \frac{1}{\sigma^{2}}}{\sum\limits_{k = 1}^{K}\;\left( {{y_{k}A_{kl}} - {A_{kl}\left\lbrack {{n_{k}A_{kl}x_{l}} - {n_{k}A_{kl}x_{l}^{(m)}} + \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right\rbrack}} \right)}} + \frac{\partial{q_{2}\left( {x;x^{(m)}} \right)}}{\partial x_{l}}}}\mspace{20mu}{where}} & (108) \\ {\mspace{79mu}{\frac{\partial{q_{2}\left( {x;x^{(m)}} \right)}}{\partial x_{l}} = {{g^{\prime}\left( {x_{l}^{(m)};\varepsilon} \right)} + {B\left( {x_{l} - x_{l}^{(m)}} \right)}}}} & (109) \end{matrix}$ Setting this derivative to zero yields the following MAP algorithm for reconstructing images from GPR data

$\begin{matrix} {{x_{l}^{({m + 1})} = {x_{l}^{(m)} + \frac{G_{l}^{(m)} - {\sigma^{2} \cdot {g^{\prime}\left( {x_{l}^{(m)};\varepsilon} \right)}}}{{\sigma^{2} \cdot B} + H_{l}}}}{where}} & (110) \\ {H_{l}\overset{\bigtriangleup}{=}{\sum\limits_{k = 0}^{K - 1}\;{r_{k}A_{kl}^{2}}}} & (111) \\ {G_{l}^{(m)}\overset{\bigtriangleup}{=}{\sum\limits_{k = 0}^{K - 1}\;{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}} & (112) \\ {{g^{\prime}\left( {x_{l}^{(m)};\varepsilon} \right)}\overset{\bigtriangleup}{=}\frac{2\;{n\left( \frac{x_{l}^{(m)}}{\varepsilon} \right)}^{{2\; n} - 1}}{\varepsilon\left( {1 + \left( \frac{x_{l}^{(m)}}{\varepsilon} \right)^{2\; n}} \right)}} & (113) \end{matrix}$ Using the particular structure of the GPR data model, the quantities H_(l) and G_(l)(^(m)) can be computed efficiently using fast implementations as described below and developed in U.S. patent application Ser. No. 14/184,446 filed Feb. 19, 2014, published as U.S. Pat. App. Pub. No. 2015/0279082, which is incorporated by reference herein.

Automatic Selection of Noise Variance and Prior Parameter

For the Butterworth Prior

The noise variance σ² and prior parameter ∈ quantities are typically unknown in practical applications. In this section, we develop a method to adaptively estimate and update the unknown noise variance σ² and the unknown Butterworth parameter ∈. First, we modify the notation to account for the fact that the MAP objective function and majorizing function explicitly depend on ∈ and σ². Thus, φ(x), φ₂(x), q(x; x^((m))), and q₂(x; x^((m))) are now expressed as φ(x, ∈, σ²), φ₂(x, ∈), q(x, ∈, σ²; x^((m))), and q₂(x, ∈; x^((m))), respectively. (Although technically incorrect, we will refer to the quantities φ(x, ∈, σ²), φ₂(x, ∈), q(x, ∈, σ²; x^((m))), and q₂(x, ∈; x^((m))) as functions in order to simplify the discussion.) The cyclic optimization method given below could in principle be used to jointly estimate the reflectance coefficient vector, x, noise variance, σ², and Butterworth prior parameter, ∈.

-   -   for m=0, 1, . . . , niter do

$\begin{matrix} {x^{({m + 1})} = {\arg\;{\min\limits_{x}\;{\phi\left( {x,\varepsilon^{(m)},\sigma^{2{(m)}}} \right)}}}} & (114) \\ {\varepsilon^{({m + 1})} = {\arg\;{\min\limits_{\varepsilon > 0}\;{\phi\left( {x^{({m + 1})},\varepsilon,\sigma^{2{(m)}}} \right)}}}} & (115) \\ {{\sigma^{2}}^{({m + 1})} = {\arg\;{\min\limits_{\sigma^{2} > 0}\;{\phi\left( {x^{({m + 1})},\varepsilon^{({m + 1})},\sigma^{2}} \right)}}}} & (116) \end{matrix}$

-   -   end for

Instead of solving the minimization problem in (114) it will be convenient to find an iterate x^((m+1)) that decreases the MAP objective function φ(x, ∈, σ²) in the following sense φ(x ^((m+1)),∈^((m)),σ^(2(m)))≦φ(x ^((m)),∈^((m)),σ^(2(m)))  (117) Taking into account the discussion above, an iterate that satisfies (117) can be found by minimizing the majorizing function q(x, ∈^((m)), σ²(m); x^((m))) (i.e., (107)) with ∈ and σ² replaced with ∈^((m)) and σ^(2(m)), respectively. Thus, a solution to (114) is

$\begin{matrix} {x^{({m + 1})}\overset{\bigtriangleup}{=}{\arg\;{\min\limits_{x}\mspace{14mu}{q\left( {x,\varepsilon^{(m)},{\sigma^{2{(m)}};x^{(m)}}} \right)}}}} & (118) \end{matrix}$

The minimization problem in (115) is solved using a one-dimensional line search such as the golden section search method. Implicitly, we have assumed that an interval can be determined that contains the minimizer of φ(x^((m+1)), ∈, σ^(2(m))). Finally, we estimate the noise power by obtaining a closed form expression for the minimizer of φ(x^((m+1)), ∈^((m+1)), σ²). An alternative to the algorithm defined by (114)-(116) is summarized below

-   -   for m=0, 1, . . . , niter do

$\begin{matrix} {x^{({m + 1})} = {\arg\;{\min\limits_{x}\mspace{14mu}{q\left( {x,\varepsilon^{(m)},{\sigma^{2{(m)}};x^{(m)}}} \right)}}}} & (119) \\ {\varepsilon^{({m + 1})} = {\arg{\min\limits_{\varepsilon_{l} \leq \varepsilon \leq \varepsilon_{k}}\mspace{14mu}{\phi\left( {x^{({m + 1})},\varepsilon,\sigma^{2{(m)}}} \right)}}}} & (120) \\ {\sigma^{2{({m + 1})}} = {{\arg\;{\min\limits_{\sigma^{2}}\;{\frac{K}{2}\log\mspace{14mu}\sigma^{2}}}} + {\frac{1}{2\;\sigma^{2}}{\phi_{1}\left( x^{({m + 1})} \right)}}}} & (121) \end{matrix}$

-   -   end for         where the solution of (120) is assumed to lie within the         interval [∈_(l), ∈_(k)].

We will now show that the cyclic minimization algorithm defined by (119)-(121) is guaranteed to monotonically decrease the MAP objective function φ(x, ∈, σ²). From (114) and (119), it follows that φ(x^((m)), ∈^((m)), σ^(2(m)))≧φ(x^((m+1)), ∈^((m)), σ^(2(m))). Also, (120) implies that φ(x^((m+1)), ∈^((m)), σ^(2(m)))≧φ(x^((m+1)), ∈^((m+1)), σ^(2(m))). Finally, it can be observed from (121) that φ(x^((m+1)), ∈^((m+1)), σ^(2(m)))≧φ(x^((m+1)), ∈^((m+1)), σ^(2(m+1))). Therefore, φ(x ^((m)),∈^((m)),σ^(2(m)))≧φ(x ^((m+1)),∈^((m+1)),σ^(2(m+1)))  (122) Thus, the iterates obtained from the cyclic minimization technique described above are guaranteed to monotonically decrease the objective function φ(x, ∈, σ²). The solution to (119) was discussed above and is given by (110) with ∈ and σ² replaced by ∈^((m)) and σ^(2(m)) respectively

$\begin{matrix} {x_{l}^{({m + 1})} = {x_{l}^{(m)} + \frac{G_{l}^{(m)} - {\sigma^{2{(m)}} \cdot {g^{\prime}\left( {x_{l}^{(m)};\varepsilon} \right)}}}{{\sigma^{2{(m)}} \cdot B} + H_{l}}}} & (123) \end{matrix}$

To obtain the estimate for the noise power, we set the derivative of φ(x^((m+1)), ∈^((m+1)), σ²) with respect to σ² to zero and solve the resulting equation. The derivative of φ(x^((m+1)), ∈^((m+1)), σ²) with respect to σ² equals

$\begin{matrix} {\frac{\partial{\phi\left( {x^{({m + 1})},\varepsilon^{({m + 1})},\sigma^{2}} \right)}}{\partial\sigma^{2}} = {\frac{K}{2\sigma^{2\;}} - {\frac{1}{2\left( \sigma^{2} \right)^{2}}{{y - {Ax}^{({m + 1})}}}_{2}^{2}}}} & (124) \end{matrix}$ Setting this derivative to zero yields the maximum likelihood estimate of the noise power

$\begin{matrix} {\sigma^{2{({m + 1})}} = {\frac{1}{K}{{y - {Ax}^{({m + 1})}}}_{2}^{2}}} & (125) \end{matrix}$

Given the single Butterworth prior for the reflectance coefficients, where the noise power and the Butterworth parameter are unknown, the implementation of the above MM-based MAP algorithm, which we call Algorithm I, is now outlined

Algorithm I: Initialization:     Obtain initial estimates x⁽⁰⁾,∈⁽⁰⁾, and σ⁽⁰⁾ and set the value for the     parameter n     Compute B numerically Iteration:      for m = 0,1,...,niter do       for l = 1,2,...,L do        Compute g′(x_(l) ^((m));∈^((m))) from equation (113),Compute x_(l) ^((m+1))        from equation (123)       end for       Compute ∈^((m+1)) from equation (120) using a line search       Compute σ^(2(m+1)) from equation (125)      end for where niter is the chosen number of iterations.

Description of the Fast Implementation

When applied in a typical GPR context, the computation of the term G_(l) ^((m)) requires the K×L matrix A where K=IJN. For the above described UWB SIRE radar system, these parameters are I=43 transmit locations, J=16 receive antennas, N=1350 data samples per return-profile, and L=25000 pixels.

These parameter settings require 173 gigabytes (GB) of memory to merely store the system matrix A. Because A has many zero-elements, however, the data could be more efficiently stored as a sparse matrix. Nevertheless, a sparse representation for A would still require approximately 16 GB of memory. With such a large memory size requirement, the construction of the A matrix in the current formulation of the algorithm is not feasible or practical for typical computing platforms, especially in field deployment where on site imaging would be advantageous. Indeed, virtually any other GPR image formation method that requires explicitly constructing the system matrix would have comparable requirements.

In addition to memory size challenges, computational cost would also be an issue for the current format of the MM-based l₁-LS algorithm. At each iteration, the computation of G_(l) ^((m)) would require the matrix multiplication Ax^((m)), which has

(KL) time complexity. This operation is thus not practical for large-scale implementations where the parameters K and L are expected to be relatively large. For example, in our case, we have K=27520 and L=25000. Additional costs include the computation of the term H_(l) where the number of non-zero elements in each of the K rows of A is needed. To arrive at a fast and memory-efficient implementation of the above algorithms, the following acceleration techniques may be implemented.

Fast Implementation of G_(l) ^((m))

In a GPR context, the mathematical expressions at equations (111) and (112) above can be modified to reduce processing time and required memory by accounting for a symmetric nature of a given radar pulse, accounting for similar discrete time delays between transmission of a given radar pulse and reception of reflections from the given radar pulse, and accounting for a short duration of the given radar pulse. Accordingly, the equation for determining estimates for values x representing reflectance coefficients of the objects in the SOI, involves calculation of the terms G_(l) ^((m)) and H_(l), which calculation can be streamlined according to the above assumptions. In application, a processing device is configured to calculate terms used to obtain the estimated value. Pursuant to these aspects, the expression for G_(l) ^((m)) in (112) can be written as

$\begin{matrix} {G_{l}^{(m)} = {\sum\limits_{i = 1}^{I}{\sum\limits_{j = 1}^{J}{\sum\limits_{n = 0}^{N - 1}{\alpha_{ijl} \cdot {p\left( {{nT}_{s} - \tau_{ijl}} \right)} \cdot {g_{ij}^{(m)}\left( {nT}_{s} \right)}}}}}} & (126) \end{matrix}$ where, for n=0, 1, . . . , N−1,

$\begin{matrix} {{g_{ij}^{(m)}\left( {nT}_{s} \right)} = {{y_{ij}\lbrack n\rbrack} - {\sum\limits_{l = 1}^{L}{\alpha_{ijl} \cdot x_{l}^{(m)} \cdot {{p\left( {{nT}_{s} - \tau_{ijl}} \right)}.}}}}} & (127) \end{matrix}$

To facilitate a fast implementation, we approximate the quantity G_(l) ^((m)) by

$\begin{matrix} {{{\hat{G}}_{l}^{(m)}\overset{\Delta}{=}{\sum\limits_{i = 1}^{I}{\sum\limits_{j = 1}^{J}{\sum\limits_{n = 0}^{N - 1}{\alpha_{ijl} \cdot {p\left( {{nT}_{s} - {n_{ijl}T_{s}}} \right)} \cdot {{\hat{g}}_{ij}^{(m)}\left( {nT}_{s} \right)}}}}}},{where}} & (128) \\ {n_{ijl}\overset{\Delta}{=}{{round}\left( \frac{\tau_{ijl}}{T_{s}} \right)}} & (129) \\ {{{\hat{g}}_{ij}^{(m)}\left( {nT}_{s} \right)}\overset{\Delta}{=}{{y_{ij}\lbrack n\rbrack} - {{\hat{s}}_{ij}^{(m)}\left( {nT}_{s} \right)}}} & (130) \\ {{{\hat{s}}_{ij}^{(m)}\left( {nT}_{s} \right)}\overset{\Delta}{=}{\sum\limits_{l = 1}^{L}{\alpha_{ijl} \cdot x_{l}^{(m)} \cdot {{p\left( {{nT}_{s} - {n_{ijl}T_{s}}} \right)}.}}}} & (131) \end{matrix}$ We will refer to the set of values {n_(ijl)} as the discrete-time delays. We can write Ĝ_(l) ^((m)) as

$\begin{matrix} {{{\hat{G}}_{l}^{(m)} = {\sum\limits_{i = 1}^{I}{\sum\limits_{j = 1}^{J}{\alpha_{ijl} \cdot {\hat{G}}_{ijl}^{(m)}}}}}{where}} & (132) \\ {{\hat{G}}_{ijl}^{(m)} = {\sum\limits_{n = 0}^{N - 1}{{p\left( {{nT}_{s} - {n_{ijl}T_{s}}} \right)} \cdot {{\hat{g}}_{ij}^{(m)}\left( {nT}_{s} \right)}}}} & (133) \\ {\mspace{45mu}{= {\sum\limits_{n = 0}^{N - 1}{{w\left\lbrack {n - n_{ijl}} \right\rbrack} \cdot {{\hat{g}}_{ij}^{(m)}\lbrack n\rbrack}}}}} & (134) \\ {\mspace{45mu}{= {\left\{ {\sum\limits_{n = 0}^{N - 1}{{w\left\lbrack {n - k} \right\rbrack}{{\hat{g}}_{ij}^{(m)}\lbrack n\rbrack}}} \right\} _{k = n_{ijl}}}}} & (135) \end{matrix}$ with w[n]

p(nT_(s)). Because the transmitted pulse p(t) is symmetric, w[n−k]=w[k−n] holds for all n and k, and thus

$\begin{matrix} {{\hat{G}}_{ijl}^{(m)} = {\left\{ {\sum\limits_{n = 0}^{N - 1}{{w\left\lbrack {n - k} \right\rbrack}{{\hat{g}}_{ij}^{(m)}\lbrack n\rbrack}}} \right\} _{k = n_{ijl}}}} & (136) \\ {\mspace{45mu}{= {\left\{ {\left( {w*{\hat{g}}_{ij}^{(m)}} \right)\lbrack k\rbrack} \right\} _{k = n_{ijl}}}}} & (137) \\ {{\mspace{45mu}{= \left\{ {\left( {w*\left( {y_{ij} - {\hat{s}}_{ij}^{(m)}} \right)} \right)\lbrack k\rbrack} \right\}}}_{k = n_{ijl}}.} & (138) \end{matrix}$

It is readily observed that computing Ĝ_(ijl) ^((m)) requires the convolution of the discrete pulse w[n] with the m^(th) iteration of the error-term sequence (y_(ij)[n]−ŝ_(ij) ^((m))[n]). The sequences w[n] and y_(ij)[n] are given. Hence, to efficiently compute Ĝ_(ijl) ^((m)), a computationally efficient way for generating the sequence ŝ_(ij) ^((m))[n] is needed.

First, we note that the collection of discrete-time delays {n_(ijl)} is expected to have repeated values. Let k_(min) and k_(max) denote respectively the minimum and maximum discrete-time delays. The sifting property of the unit impulse function can be used to write

$\begin{matrix} {{{\hat{s}}_{ij}^{(m)}\lbrack n\rbrack} = {\sum\limits_{k = k_{m\; i\; n}}^{k_{{ma}\; x}}\left\{ {\sum\limits_{l = 1}^{L}{\alpha_{ijl} \cdot x_{l}^{(m)} \cdot {w\left\lbrack {n - n_{ijl}} \right\rbrack} \cdot {\delta\left\lbrack {k - n_{ijl}} \right\rbrack}}} \right\}}} & (139) \\ {\mspace{70mu}{= {\sum\limits_{k = k_{m\; i\; n}}^{k_{{ma}\; x}}\left\{ {\sum\limits_{l = 1}^{L}{\alpha_{ijl} \cdot x_{l}^{(m)} \cdot {w\left\lbrack {n - k} \right\rbrack} \cdot {\delta\left\lbrack {k - n_{ijl}} \right\rbrack}}} \right\}}}} & (140) \\ {\mspace{70mu}{= {\sum\limits_{k = k_{m\; i\; n}}^{k_{{ma}\; x}}{\left\{ {\sum\limits_{l = 1}^{L}{\alpha_{ijl} \cdot x_{l}^{(m)} \cdot {\delta\left\lbrack {k - n_{ijl}} \right\rbrack}}} \right\}{w\left\lbrack {n - k} \right\rbrack}}}}} & (141) \\ {\mspace{70mu}{= {\sum\limits_{k = k_{m\; i\; n}}^{k_{{ma}\; x}}{{q_{ij}\lbrack k\rbrack} \cdot {w\left\lbrack {n - k} \right\rbrack}}}}} & (142) \\ {\mspace{70mu}{{= {\left( {q_{ij}*w} \right)\lbrack n\rbrack}}{where}}} & (143) \\ {{q_{ij}\lbrack k\rbrack}\overset{\Delta}{=}{\sum\limits_{l = 1}^{L}{\alpha_{ijl} \cdot x_{l}^{(m)} \cdot {{\delta\left\lbrack {k - n_{ijl}} \right\rbrack}.}}}} & (144) \end{matrix}$ The term q_(ij)[k] can then be expressed as

$\begin{matrix} {{q_{ij}\lbrack k\rbrack}\overset{\Delta}{=}{\sum\limits_{l \in {??}_{k}}d_{ijl}^{(m)}}} & (145) \end{matrix}$ where d_(ijl) ^((m))=α_(ijl)·x_(l) ^((m)) and S_(k)={l=1, 2, . . . , L: n_(ijl)=k}.

In other words, the term q_(ij)[k] is computed by accumulating all elements of d _(ijl) ^((m)) =[d _(ij1) ^((m)) ,d _(ij2) ^((m)) , . . . ,d _(ijL) ^((m)))]  (146) for which associated discrete-time delay indexes n_(ijl) have the same integer value k. Consequently, q_(ij)[k] can then be computed in a very efficient manner using the hash table data structure concept. The indexes of a hash table, typically referred to as keys, are the integers between k_(min) and k_(max), and the record associated with the k^(th) key is the set of values {d _(ijl) ^((m)) :l=1,2, . . . ,L;n _(ijl) =k}.  (147) By one approach, the hash-table-based computation of q_(ij)[k] is implemented using a processing device configured to use MATLAB using the accumarray function. The variables d, n and q store the following sequences: d←d_(ijl) ^((m))=[d_(ij1) ^((m)),d_(ij2) ^((m)), . . . ,d_(ijL) ^((m)),]  (148) n←n_(ijl)=[n_(ij1),n_(ij2), . . . ,n_(ijL)]  (149) q←q_(ij)=[q_(ij)[1],q_(ij)[2], . . . ,q_(ij)[k_(max)]]  (150) The variable q is computed via the command q=accumarray(n, d) where k_(min)≦n_(ijl)≦k_(max) for all l, q_(ij)[k]=0 for all indexes k<k_(min). An example of pseudocode to be run by the processing device for implementation of the proposed algorithm for efficiently computing G_(l) ^((m)) is given below.

Subroutine 1: Pseudocode for computing G_(l) ^((m)) for l=1, 2, . . . , L

for i = 1, 2, . . ., I do  for j = 1, 2, . . ., J do   SET q_(ij)[k] = 0 for 0 ≦ k < k_(min)   for k = k_(min), k_(min) + 1, . . ., k_(max) do    S_(k) = {l = 1, 2, . . ., L:n_(ijl) = k}      ${q_{ij}\lbrack k\rbrack} = {\sum\limits_{l \in S_{k}}{d_{ijl}^{(m)}\mspace{14mu}\left( {{hash}\text{-}{table}\text{-}{implementation}} \right)}}$    end for ŝ_(ij) ^((m))[n] = (q_(ij) * w)[n]    for l = 1, 2, . . ., L do Ĝ_(ijl) ^((m)) = {(w * (y_(ij) − s_(ij)(m)))[k]}|_(k=n) _(ijl)   end for  end for end for ${\hat{G}}_{l}^{(m)} = {\sum\limits_{i = 1}^{I}\;{\sum\limits_{j = 1}^{J}\;{\alpha_{ijl} \cdot {\hat{G}}_{ijl}^{(m)}}}}$

In addition to being more computationally efficient, the proposed implementation does not require constructing the large system matrix A. A tangible benefit of this fact is the size of data (i.e., the number of transmit locations) that can be used to form an image is no longer limited. It is also readily observed from the pseudocode that the computation of Ĝ_(l) ^((m)) is parallelizable such that faster processing techniques such as parallel or GPU based processing can be used to process the data.

Fast Implementation of H_(l)

An alternative expression for H_(l) in (111) is

$\begin{matrix} {{H_{l} = {\sum\limits_{i = 1}^{I}{\sum\limits_{j = 1}^{J}{\sum\limits_{n = 1}^{N}{\left( \alpha_{ijl} \right)^{2} \cdot r_{ijn} \cdot {\beta\left( {{nT}_{s} - \tau_{ijl}} \right)}}}}}},} & (151) \end{matrix}$ where β(t)

p²(t) and r_(ijn) is the number of non-zero elements in the n^(th) row of the N×L sub-matrix A_(ij)=P_(ij)D_(ij). To facilitate a fast implementation, we approximate the quantity H_(l) by

$\begin{matrix} {{{\hat{H}}_{l} = {\sum\limits_{i = 1}^{I}{\sum\limits_{j = 1}^{J}{\sum\limits_{n = 1}^{N}{\left( \alpha_{ijl} \right)^{2} \cdot r_{ijn} \cdot {\beta\left( {{nT}_{s} - {n_{ijl}T_{s}}} \right)}}}}}},} & (152) \end{matrix}$ We write Ĥ_(l) as

$\begin{matrix} {{{\hat{H}}_{l} = {\sum\limits_{i = 1}^{I}{\sum\limits_{j = 1}^{J}{\left( \alpha_{ijl} \right)^{2} \cdot {\hat{H}}_{ijl}}}}}{where}} & (153) \\ {{\hat{H}}_{ijl} = {\sum\limits_{n = 0}^{N - 1}{r_{ijn} \cdot {\beta\left( {{nT}_{s} - {n_{ijl}T_{s}}} \right)}}}} & (154) \\ {\mspace{40mu}{= {\sum\limits_{n = 0}^{N - 1}{{\gamma_{ij}\lbrack n\rbrack} \cdot {h\left\lbrack {n - n_{ijl}} \right\rbrack}}}}} & (155) \\ {\mspace{40mu}{= {\left\{ {\sum\limits_{n = 0}^{N - 1}{{\gamma_{ij}\lbrack n\rbrack} \cdot {h\left\lbrack {n - k} \right\rbrack}}} \right\} _{k = n_{ijl}}}}} & (156) \\ {\mspace{40mu}{= {\left\{ {\sum\limits_{n = 0}^{N - 1}{{\gamma_{ij}\lbrack n\rbrack} \cdot {h\left\lbrack {k - n} \right\rbrack}}} \right\} _{k = n_{ijl}}}}} & (157) \\ {\mspace{40mu}{= \left\{ {\left( {\gamma_{ij}*h} \right)\lbrack k\rbrack} \right\}}}_{k = n_{ijl}} & (158) \end{matrix}$ with h[n]

β(nT_(s)), and r_(ijn) is now represented by the n-indexed sequence γ_(ij)[n]

r_(ijn). For the sake of convenience and consistency, we assume here that rows of a matrix are counted starting from a zeroth row. The computation Ĥ_(ijl) requires the convolution of the squared and discretized pulse h[n] with the sequence γ_(ij)[n]. The computation of Ĥ_(ijl) is significantly accelerated with the introduction of a fast procedure for generating γ_(ij)[n].

First, we recall that the sample γ_(ij)[n] is the number of non-zeros entries in the n^(th) row of the N×L sub-matrix A_(ij). Because the radar system has an ultra wide band, the transmitted pulse p(t) is short. Consequently, the samples of the length-N sequence w[n] are zero (or practically zero) for indexes n such that |n|<M and non-zero, otherwise. The parameter M is even and significantly smaller than N. The l^(th) column of A_(ij) coincides with the length-N vector [α_(ijl)·p(0−τ_(ijl)),α_(ijl)·p(T_(s)−τ_(ijl)), . . . ,α_(ij)·p((N−1)T_(s)−τ_(ijl))]^(T).  (159) The (n, l)-entry of A_(ij) is thus non-zero if |nT _(s)−τ_(ijl) |≦MT _(s).  (160) Using (129), the above rule in (160) can be approximated by |n−n _(ijl) |≦M.  (161)

A computed delay index n_(ijl) is such that 0≦n_(ijl)≦N. Consequently, for computational convenience, we write that the (n, l)-entry of A_(ij) is non-zero if max(0,n−M)≦n _(ijl)≦min(n+M,N).  (162) The number γ_(ij)[n] of non-zeros entries in the n^(th) row of A_(ij) is thus equal to the number of elements in the n^(th) row that satisfy (162). A more convenient definition is γ_(ij) [n]=|

_(n)|  (163) where |

_(n)| denotes the number of elements in the set

_(n) ={l=1,2, . . . ,L|max(0,n−M)≦n _(ijl)≦min(n+M,N)}.  (164)

The parameter γ_(ij)[n] can be efficiently computed by taking advantage of the hash-table-based fast implementation concept used in (145). First, we write γ_(ij) [n]=|

_(n) ⁺|−|

_(n) ⁻|  (165) where

_(n) ⁺ ={l=1,2, . . . ,L|n _(ijl)≦min(n+M,N)}  (166)

_(n) ⁻ ={l=1,2, . . . ,L|n _(ijl)≦min(0,n−M−1)}.  (167) The expression in (165) is further expanded as

$\begin{matrix} {{\gamma_{ij}\lbrack n\rbrack} = {{\sum\limits_{k = 0}^{{{mi}n}{({{n + M},N})}}\;{{??}_{k}}} - {\sum\limits_{k = 0}^{\max{({0,{n - M}})}}\;{{??}_{k}}}}} & (168) \end{matrix}$ Finally, we have

$\begin{matrix} {{\gamma_{ij}\lbrack n\rbrack} = {{v\;\left\lbrack {\min\left( {{n + M},N} \right)} \right\rbrack} - {v\left\lbrack {\max\left( {0,{n - M}} \right)} \right\rbrack}}} & (169) \\ {where} & \; \\ {{v\lbrack m\rbrack} = {\sum\limits_{k = 0}^{m}\;{{??}_{k}}}} & (170) \end{matrix}$ with S_(k)={l=1, 2, . . . , L: n_(ijl)=k}. The inner summation in (170) (and, hence the computation of v[m]) is efficiently computed using the hash-table-based fast implementation previously discussed and used in (145). Example pseudocode to be run by the processing device for implementation of the proposed algorithm for efficiently computing H₁ is given below.

Subroutine 2: Pseudocode for computing H_(l) for 1=1, 2, . . . , L

for i = 1, 2, . . ., I do  for j = 1, 2, . . ., J do   SET q[k] = 0 for 0 ≦ k < k_(min)   for k = k_(min), k_(min) + 1, . . ., k_(max) do    S_(k) = {l = 1, 2, . . ., L:n_(ijl) = k}    q[k] = |S_(k)| (hash-table-implementation)   end for   for m = 0, 1, . . ., N do     ${v\lbrack m\rbrack} = {\sum\limits_{k = 0}^{m}\;{q\lbrack k\rbrack}}$   end for   for n = 0, 1, . . ., N do    γ_(ij)[n] = w[min(n + M, N)] − w[max(0, n − M)]   end for   for l = 1, 2, . . ., L do    Ĥ_(ijl) = {(γ_(ij) * h)[k]}|_(k=n) _(ijl)   end for  end for end for ${\hat{H}}_{l} = {\sum\limits_{i = 1}^{I}\;{\sum\limits_{j = 1}^{J}\;{\left( \alpha_{ijl} \right)^{2} \cdot {\hat{H}}_{ijl}}}}$

Putting together the results for calculating terms G_(l) ^((m)) and H_(l), example pseudocode for the l₁-LS algorithm follows.

Pseudocode for computing l₁-LS algorithm for m=1, 2, . . . , num_(it)Initialization: x⁽⁰⁾={x₁ ⁽⁰⁾, x₂ ⁽⁰⁾, . . . , x_(L) ⁽⁰⁾}

for l = 1, 2, . . ., L do  Compute Ĥ_(l) (via Subroutine 2) end for for m = 1, 2, . . ., num_(it) do   for l = 1, 2, . . ., L do    Compute Ĝ_(l) ^((m)) (via Subroutine 1)   end for    $x_{l}^{({m + 1})} = \frac{{{\hat{H}}_{l} \cdot x_{l}^{(m)}} + {\hat{G}}_{l}^{(m)}}{{\hat{H}}_{l} + \frac{\lambda}{\left. 2 \middle| x_{l}^{(m)} \right|}}$ end for

Summary of Results for the Various Approaches

The described MM-based MAP algorithms are applicable to large-scale, real applications. Although the proposed algorithms effectively estimates reflection coefficients of scenes-of-interest using GPR datasets, the algorithms could be readily applied to a variety of applications where datasets are collected using synthetic aperture imaging measurement principles. We have applied our algorithms to simulated and real datasets supplied by ARL. The results obtained using the described MAP algorithms are consistent in the sense that the resulting images, for both simulated and real datasets, have the same characteristics in terms of noise removal and sidelobe reduction. In general, the images generated were sufficiently sparse without a loss of known scatterers. The desirable results obtained using real datasets are even more encouraging because they suggest that our algorithms are robust enough for practical applications.

With respect to the described Butterworth prior used to exploit the known sparsity constraint of scatterers within an SOI, this approach does not require any user input parameters. It can also be observed that the application of the Butterworth prior need not be limited to modeling the distribution of reflectance coefficients. The Butterworth prior is a good approximation for the uniform distribution and can thus be used in other problems that require a uniform distribution model. Moreover, the majorizing functions we developed for the negative log priors are general enough that they can be used in other MM-based MAP methods regardless of the form of the density function for the corresponding problem. Finally, we have improved upon image reconstruction techniques, namely the DAS and RSM algorithms, currently used by ARL for GPR.

The methods described above can be implemented as illustrated in FIG. 8, where a processing device receives 805 a data set and processes 810 the initial data set by creating an estimated image value for individual voxels in the image by iteratively deriving the estimated image value through application of a majorize-minimize technique to solve a maximum a posteriori (MAP) estimation problem having a MAP objection function associated with a mathematical model of image data from the data. These basic steps can be applied to achieve fast and computationally efficiently prepared image using the estimated image value of individual voxels of the image that can be displayed 815, wherein the initial data set may be sourced from a variety of applications where datasets are collected using synthetic aperture imaging measurement principles. In the GPR context, the method may further include emitting 820 a radar pulse at specified intervals into a scene-of-interest and detecting 825 magnitude of signal reflections from the scene of interest from the radar pulse. Position data is recorded 830 corresponding to individual radar pulse emissions and individual receptions of the signal reflections. The initial data set in this application is created 835 from the position data and detected magnitudes of the signal reflections. Where the method is carried out remote from the vehicle, it is sufficient where the receipt of the data to be processed includes receiving data representing transmission site locations of radar pulses, reception site locations of reception of reflections from the radar pulses, radar-return profiles for pairings of the transmission site locations and the reception site locations, and data samples associated with individual radar-return profiles.

We tested the first described approach to a GPR MAP reconstruction (GMR) algorithm using simulated GPR data that closely mimics the data generated by ARL's SIRE system. FIG. 8 shows the layout of the point-scatterers for the reflectance image.

We corrupted the simulated data with white Gaussian noise (WGN). Note that although we assumed the noise is WGN, the noise in practice may differ greatly from this assumption. We compare the images obtained from the GMR and DAS algorithms subjectively, and objectively using the receiver operating curves (ROCs) that result from the Reed-Xiaoli detector. FIG. 9 shows the image obtained using the DAS algorithm. The image has significant side lobes and shadows, and the background noise is clearly visible. By contrast, the image generated by the GMR algorithm (FIG. 10) is sparser and adequately suppress the side lobes and background noise.

The ROCs in FIGS. 12 and 13 show that the GMR algorithm suppresses the background noise well enough to provide a better detection rate and a lower false alarm rate than the DAS algorithm. Similar results were obtained for data from ARL. We show only the real data image FIGS. 14 and 15 for brevity.

We also similarly evaluated the performance of the MAP algorithm using the Jeffery's, Laplacian-like, and Laplacian priors using a synthetic GPR dataset. These algorithms are compared to the DAS, RSM and LMM algorithms. The LMM algorithm is a fast algorithm that uses the majorization-minimization technique to solve the l₁-regularized least-squares estimation problem (see U.S. patent application Ser. No. 14/245,733 filed Feb. 19, 2014, which is incorporated by reference herein).

FIG. 16 shows the image obtained using the DAS algorithm. The image has significant side-lobes and shadows, and the background noise is clearly visible. The RSM image in FIG. 17 is an improvement over the DAS image but there is room for improvement. By contrast, the images generated by the l₁-regularized least squares algorithm (FIG. 18) and GMR algorithms (see FIGS. 19-21) are sparser and adequately suppresses the side-lobes and background noise. However, unlike the LMM algorithm, the GMR algorithms are user independent because the noise variance and prior parameter are estimated from the observed GPR data.

We also similarly evaluated the performance of the MAP-Butterworth algorithm using both synthetic and real GPR datasets provided by ARL. The MAP-Butterworth algorithm is compared to the DAS, RSM, and LMM algorithms.

We applied the DAS, RSM, LMM and MAP-Butterworth algorithms to the simulated data. In FIG. 22, we observe that the standard DAS algorithm is not able to satisfactorily remove the noise and sidelobes. In FIG. 23, we observe that the RSM algorithm improves upon the DAS algorithm's results but substantial noise and sidelobes are still present in the image it generated. In contrast, as seen in FIG. 24 and FIG. 25, both the LMM and MAP-Butterworth algorithms retain all the scatterers while significantly suppressing the sidelobes and noise. We observe that the scatterers in the MAP-Butterworth-image are slightly more sparse than in the LMM-image. This may be attributed to the fact in the MAP-Butterworth all parameters are optimally estimated whereas the regularization parameter λ in the LMM algorithm was user-selected using a trial-and-error approach.

We now evaluate the performance of the MAP-Butterworth algorithm using data obtained from ARL's FLGPR system. The test data corresponds to measurements taken from I=274 consecutive transmit locations using J=16 receive antennas. We choose a 65×25 m² SOI and divide it into a grid of 250 voxels in the cross-range direction and 3200 voxels in the down-range direction. As with the simulated data, the cross-range and down-range voxel sizes are 0.1 m and 0.02 m, respectively.

Due to the large size of the SOI, an image is not generated by estimating the reflection coefficients of the voxels all at once. Such an image would have cross-range resolution that varies from the near-range to the far-range voxels. The voxels in the near-range would have higher resolution than those in the far-range. To create GPR images with consistent resolution across the SOI, we use the mosaicing approach discussed in reference to FIG. 4 above. The radar-return data and GPS positioning measurements associated with each sub-aperture are used to estimate the reflection coefficients for the corresponding sub-image. The reconstructed sub-images are merged together to obtain the complete image of the SOI. It should be noted that there is a system matrix for each pair of sub-apertures and sub-SOI. In this sense the model for real GPR data is time-varying.

FIGS. 26-29 show the GPR images that were generated using the DAS, RSM and LMM and MAP-Butterworth algorithms, respectively. It is evident from these figures that the DAS image has significant side lobes and shadows, with clearly visible background noise. Although, side lobes and shadows are reduced in the RSM image, there is still room for improvement. The images obtained using the LMM and MAP-Butterworth algorithms, shown respectively in FIGS. 28 and 29, are sparser than the DAS and RSM images, and adequately suppress both the side lobes and background noise. These results show that the MAP-Butterworth algorithm gives results comparable to those of popular l₁-LS algorithms without the usual challenges of finding the most suitable regularization parameter.

Those skilled in the art will recognize that a wide variety of modifications, alterations, and combinations can be made with respect to the above described embodiments without departing from the scope of the invention, and that such modifications, alterations, and combinations are to be viewed as being within the ambit of the inventive concept. 

The invention claimed is:
 1. A method of creating an image from received data by estimating unknown reflection coefficients for individual voxels in a scene of interest (SOI), the method comprising: receiving the data; processing the data with a processing device by iteratively deriving an estimated image value for individual voxels in the image through application of a majorize-minimize technique to minimize a maximum a posteriori (MAP) objective function applied to the data, wherein the MAP objective function includes: a data component including at least a portion of the data, and a prior probability density function for the unknown reflection coefficients configured to apply a restriction that a majority of the reflection coefficients are equal to zero or substantially equal to zero; displaying the image using the estimated image value of individual voxels of the scene of interest.
 2. The method of claim 1 wherein the receiving the data comprises receiving data including information representing 1) transmission site locations of radar return signals, 2) reception site locations of reception of reflections from the radar pulses, 3) radar-return profiles for pairings of the transmission site locations and the reception site locations, and 4) data samples associated with individual radar-return profiles.
 3. The method of claim 1 wherein the data component of the MAP objective function is ${\frac{1}{2\sigma^{2}}{\phi_{1}(x)}} + {\frac{K}{2}\log\;\sigma^{2}}$ where σ² is variance of noise in the data and where φ₁(x)

∥y−Ax∥₂ ²; where y is the data, A is a K×L system matrix associated with the data, K=IJL where I is a number of transmission site locations, J is a number of reception sites for each transmit site location, L is a number of pixels in the image, and x is a vector representing the reflection coefficients for the scene of interest.
 4. The method of claim 3 wherein the prior probability density function for the unknown reflection coefficients is $\sum\limits_{l = 1}^{L}\;{s\left( {x_{l},\lambda} \right)}$ where x_(l) is the estimated image value at an l^(th) voxel, and ${s\left( {x_{l},\lambda} \right)}\overset{\Delta}{=}{{- \log}\frac{\frac{\lambda}{\log\; 4}}{1 + {\exp\left( {\lambda{x_{l}}} \right)}}}$ where λ is a constant controlling a degree of application of the restriction that the majority of the reflection coefficients are small.
 5. The method of claim 4 wherein the iteratively deriving the estimated image value comprises iteratively deriving the reflection coefficient for a given voxel using for l=1,2, . . . , L $x_{l}^{({m + 1})} = \frac{{H_{l} \cdot x_{l}^{(m)}} + G_{l}^{(m)}}{H_{l} + \frac{\sigma^{2^{(m)}}{s^{\prime}\left( {x_{l}^{(m)},\lambda^{(m)}} \right)}}{x_{l}^{(m)}}}$ where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}$ where r_(k) is the number of non-zero elements in a k^(th) row of a matrix A, x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ² ^((m)) is an estimation of variance of noise in the data, and ${s^{\prime}\left( {a,\lambda} \right)} = \frac{\lambda\; a\;{\exp\left( {\lambda{a}} \right)}}{{a}\left\lbrack {1 + {\exp\left( {\lambda{a}} \right)}} \right\rbrack}$ where λ is a constant controlling a degree of application of the restriction that the majority of the reflection coefficients are small.
 6. The method of claim 5 wherein $\sigma^{2^{(m)}} = \frac{{{y - {Ax}^{(m)}}}_{2}^{2}}{K}$ is used as an estimate for the variance σ² of the noise in the data for iterate number (m) and ${\hat{\lambda}}^{(m)} = {\arg{\min\limits_{\lambda}{\phi\left( {x^{(m)},\sigma^{2^{(m)}},\lambda} \right)}}}$ as solved using a 1D line search, and where φ is the MAP objection function, is used as an estimate for the constant controlling the degree of application of the restriction that the majority of the reflection coefficients are small, and where x^((m))

[x₁ ^((m)),x₂ ^((m)), . . . ,x_(l) ^((m))].
 7. The method of claim 3 wherein a negative log of a summation of the prior probability density function for the unknown reflection coefficients is $\sum\limits_{l = 1}^{L}\;{s\left( {x_{l},\theta} \right)}$ where x_(l) is the estimated image value at an l^(th) voxel, and ${s(a)}\overset{\Delta}{=}{{- \log}{\frac{1}{a}.}}$
 8. The method of claim 7 wherein the iteratively deriving the estimated image value comprises iteratively deriving the reflection coefficient for a given voxel using f or l=1,2, . . . , L $x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2^{(m)}}\frac{1}{\left( x_{l}^{(m)} \right)^{2}}}}$ where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}$ where x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ² ^((m)) is an estimation of variance of noise in the data, and $\frac{s^{\prime}(a)}{a} = {\frac{1}{a^{2}}.}$
 9. The method of claim 8 wherein $\sigma^{2^{({m + 1})}} = \frac{{{y - {Ax}^{({m + 1})}}}_{2}^{2}}{K}$ is used as an estimate for the variance σ² of the noise in the data for iterate number (m+1) and $\lambda^{({m + 1})} = {\arg{\min\limits_{\lambda}\;{\phi\left( {x^{({m + 1})},\lambda,\sigma^{2^{(m)}}} \right)}}}$ as solved using a 1D line search, and where φ is the MAP objection function, is used as an estimate for the constant controlling the degree of application of the restriction that the majority of the reflection coefficients are small, and where x^((m))

[x₁ ^((m)), x₂ ^((m)), . . . , x_(l) ^((m))].
 10. The method of claim 3 wherein the prior probability density function for the unknown reflection coefficients is a function with an approximately uniform peak across a width, wherein beyond the width the function decays to zero.
 11. The method of claim 10 wherein the function is defined as ${f\left( {a;\varepsilon} \right)}\overset{\Delta}{=}\frac{k_{n}(\varepsilon)}{1 + \left( \frac{a}{\varepsilon} \right)^{2n}}$ where ∈ and n are parameters that control width and decay rate for this prior probability density function and where ${k_{n}(\varepsilon)}\overset{\Delta}{=}{\left\lbrack {\overset{\infty}{\int\limits_{- \infty}}{\frac{1}{1 + \left( \frac{a}{\varepsilon} \right)^{2n}}d\; a}} \right\rbrack^{- 1}.}$
 12. The method of claim 11 wherein the iteratively deriving the estimated image value comprises iteratively deriving the reflection coefficient for a given voxel using f or l=1,2, . . . , L $x_{l}^{({m + 1})} = {x_{l}^{(m)} + \frac{G_{l}^{(m)} - {\sigma^{2{(m)}} \cdot {g^{\prime}\left( {x_{l}^{(m)};\varepsilon} \right)}}}{{\sigma^{2{(m)}} \cdot B} + H_{l}}}$ where x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ^(2(m)) is the estimate of the variance of noise in the data at the iterate number (m), B is least upper bound of the second derivative of ${g\left( {a;\varepsilon} \right)} = {{- \log}\frac{k_{n}(\varepsilon)}{1 + \left( \frac{a}{\varepsilon} \right)^{2n}}}$ where ∈ and n are parameters that control width and decay rate for the prior probability density function and where ${k_{n}(\varepsilon)}\overset{\Delta}{=}\left\lbrack {\overset{\infty}{\int\limits_{- \infty}}{\frac{1}{1 + \left( \frac{a}{\varepsilon} \right)^{2n}}d\; a}} \right\rbrack^{- 1}$ and  where ${g^{\prime}\left( {a;\varepsilon} \right)} = \frac{2{n\left( \frac{a}{\varepsilon} \right)}^{{2n} - 1}}{\varepsilon\left( {1 + \left( \frac{a}{\varepsilon} \right)^{2n}} \right)}$ and where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}{{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}.}}$
 13. The method of claim 12 wherein the variance of the noise in the data is estimated using a maximum likelihood estimate of noise power.
 14. The method of claim 13 wherein the maximum likelihood estimate of noise power for iterate number (m) is given by $\sigma^{2^{(m)}} = {\frac{1}{K}{{y - {Ax}^{(m)}}}_{2}^{2}}$
 15. The method of claim 1 wherein the calculating the terms comprises: computing G_(l) ^((m)) by applying a hash-table-based computation to ${q_{ij}\lbrack k\rbrack} = {\sum\limits_{l \in S_{k}}\; d_{ijl}^{(m)}}$ where d_(ijl) ^((m))=a_(ijl)x_(l) ^((m)) and S_(k)={l=1,2, . . . , L:n_(ijl)=k} and a_(ijl) represents attenuation of the given radar pulse during travel from an i^(th) transmit location to an l^(th) voxel and back to a j^(th) receiver.
 16. The method of claim 15 further comprising computing G_(l) ^((m)) via i=1,2, . . . , I; j=1,2, . . . , J; l=1,2, . . . , L $G_{l}^{(m)} = {\sum\limits_{i = 1}^{I}\;{\sum\limits_{j = 1}^{J}\;{\alpha_{ijl}G_{ijl}^{(m)}}}}$ where G_(ijl)^((m)) = {(w * (y_(ij) − s_(ij)^((m))))[k]}❘_(k = n_(ijl))s_(ij)^((m))[n] = (q_(ij) * w)[n] and where y_(ij)[k] is a k^(th)sample of a radar-return profile associated with an i^(th) transmit location and j^(th) receiver, s_(ij) ^((m))[k] is the m^(th) estimate of a noise-free component of y_(ij)[k], w is a discretized version of the given radar pulse, and n_(ijl) is a discrete time-delay corresponding to rounding the quotient of the time taken by the given radar pulse to travel from a transmitter at the i^(th) transmit location to the l^(th) voxel and back to the j^(th) receiver and a sampling interval, and * denotes discrete-time convolution.
 17. The method of claim 12 wherein the calculating of the terms comprises: computing H_(l) by applying a hash-table-based computation to |S_(k)| where |S_(k)| denotes a number of elements in the set S_(k).
 18. The method of claim 17 wherein the computing H_(l) further comprises computing H_(l) via i=1,2, . . . . ,I;j=1,2, . . . . ,J;l=1,2, . . . ,L $H_{l} = {\sum\limits_{i = 1}^{I}\;{\sum\limits_{j = 1}^{J}\;{\alpha_{ijl}^{2}H_{ijl}}}}$ where ${H_{ijl} = {{\left( {\gamma_{ij}*h} \right)\lbrack k\rbrack}❘_{k = n_{ijl}}}},{{\gamma_{ij}\lbrack n\rbrack} = {v\left( {{\min\left( {{n + M},N} \right)} - {v\left( {\max\left( {0,{n - M}} \right)} \right)}} \right)}},{{v(m)} = {\sum\limits_{k = 0}^{m}\;{S_{k}}}}$ and where h is a discretized version of a squared radar pulse and 2M+1 is a number of non-zero samples in the given radar pulse.
 19. The method of claim 1 further comprising: emitting a radar pulse at specified intervals into a scene of interest; detecting amplitudes of signal reflections from the scene of interest from the radar pulse; recording position data corresponding to individual radar pulse emissions and individual receptions of the signal reflections; and creating the initial data set from the position data and detected amplitudes of the signal reflections.
 20. An apparatus for detecting objects in a scene of interest, the apparatus comprising: a vehicle; a plurality of radar transmission devices mounted on the vehicle and configured to transmit radar pulses into a scene of interest; a plurality of radar reception devices mounted on the vehicle, individual ones of the plurality of radar reception devices being configured to detect amplitudes of signal reflections received from the scene of interest from the radar pulses; a location determination device configured to detect a location of the vehicle at a time of transmission of the radar pulse from the plurality of radar transmission devices and reception of the signal reflections by the radar reception devices; a processing device configured to process a data set including information representing 1) transmission site locations of individual ones of the radar pulses, 2)reception site locations of reception of individual ones of the signal reflections, and 3) a number of data samples per reception profile by: iteratively deriving an estimated image value for each voxel in the image through application of a majorize-minimize technique to minimize a maximum a posteriori (MAP) objective function applied to the data wherein the MAP objective function includes: a data component including at least a portion of the data, and a prior probability density function for the unknown reflection coefficients to apply an assumption that a majority of the reflection coefficients are equal to zero or substantially equal to zero.
 21. The apparatus of claim 20 wherein the processing device is further configured to calculate the data component of the MAP objective function as ${\frac{1}{2\sigma^{2}}{\phi_{1}(x)}} + {\frac{K}{2}\log\;\sigma^{2}}$ where σ² is variance of noise in the data and where φ₁(x)

∥y−Ax∥₂ ²; where y is the data, A is a K×L system matrix associated with the data, K=IJL where I is a number of transmission site locations, J is a number of reception sites for each transmit site location, L is a number of pixels in the image, and x is a vector representing the reflection coefficients for the scene of interest.
 22. The apparatus of claim 21 wherein the processing device is further configured to use the prior probability density function for the unknown reflection coefficients defined as $\sum\limits_{l = 1}^{L}\;{s\left( {x_{l},\lambda} \right)}$ where x_(l) is the estimated image value at an l^(th) voxel, and ${s\left( x_{l} \right)}\overset{\Delta}{=}{{- \log}\frac{\frac{\lambda}{\log\; 4}}{1 + {\exp\left( {\lambda{x_{l}}} \right)}}}$ where λ is a constant controlling a degree of application of the restriction that the majority of the reflection coefficients are small.
 23. The apparatus of claim 22 wherein the processing device is further configured to iteratively derive the reflection coefficient for a given voxel using f or l=1,2, . . . , L $x_{l}^{({m + 1})} = \frac{{H_{l} \cdot x_{l}^{(m)}} + G_{l}^{(m)}}{H_{l} + \frac{\sigma^{2^{(m)}}{s^{\prime}\left( {x_{l}^{(m)},\lambda^{(m)}} \right)}}{x_{l}^{(m)}}}$ where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}$ where r_(k) is the number of non-zero elements in a k^(th) row of a matrix A, x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ² is variance of noise in the data, and ${s^{\prime}\left( {a,\lambda} \right)} = \frac{\lambda\; a\;{\exp\left( {\lambda{a}} \right)}}{{a}\left\lbrack {1 + {\exp\left( {\lambda{a}} \right)}} \right\rbrack}$ where λ is a constant controlling a degree of application of the restriction that the majority of the reflection coefficients are small.
 24. The apparatus of claim 23 wherein $\sigma^{2^{(m)}} = \frac{{{y - {Ax}^{(m)}}}_{2}^{2}}{K}$ is used as an estimate for the variance σ² of the noise in the data and ${\hat{\lambda}}^{(m)} = {\arg{\min\limits_{\lambda}\;{\phi\left( {x^{(m)},\sigma^{2^{(m)}},\lambda} \right)}}}$ as solved using a 1D line search, and where φ is the MAP objection function, is used as an estimate for the constant controlling the degree of application of the restriction that the majority of the reflection coefficients are small, and where x^((m))

[x₁ ^((m)), x₂ ^((m)), . . . , x_(l) ^((m))].
 25. The apparatus of claim 21 wherein the processing device is further configured to use a negative log of a summation of the prior probability density function for the unknown reflection coefficients as $\sum\limits_{l = 1}^{L}\;{s\left( {x_{l},\theta} \right)}$ where x_(l) is the estimated image value at an l^(th) voxel, and ${s(a)}\overset{\Delta}{=}{{- \log}{\frac{1}{a}.}}$
 26. The apparatus of claim 25 wherein the processing device is further configured to iteratively derive the reflection coefficient for a given voxel using f or l=1,2, . . . , L $x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2^{(m)}}\frac{1}{\left( x_{l}^{(m)} \right)^{2}}}}$ where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}$ where x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ² ^((m)) is an estimation of variance of noise in the data, and $\frac{s^{\prime}(a)}{a} = {\frac{1}{a^{2}}.}$
 27. The apparatus of claim 26 wherein $\sigma^{2^{({m + 1})}} = \frac{{{y - {Ax}^{({m + 1})}}}_{2}^{2}}{K}$ is used as an estimate for the variance σ² of the noise in the data for iterate number (m+1) and $\lambda^{({m + 1})} = {\arg{\min\limits_{\lambda}\;{\phi\left( {x^{({m + 1})},\lambda,\sigma^{2^{(m)}}} \right)}}}$ as solved using a 1D line search, and where φ is the MAP objection function, is used as an estimate for the constant controlling the degree of application of the restriction that the majority of the reflection coefficients are small, and where x^((m))

[x₁ ^((m)), x₂ ^((m)), . . . , x_(l) ^((m))].
 28. The apparatus of claim 21 wherein the prior probability density function for the unknown reflection coefficients is a function with an approximately uniform peak across a width, wherein beyond the width the function decays to zero.
 29. The apparatus of claim 28 wherein the function is defined as ${f\left( {a;\varepsilon} \right)}\overset{\Delta}{=}\frac{k_{n}(\varepsilon)}{1 + \left( \frac{a}{\varepsilon} \right)^{2n}}$ where ∈ and n are parameters that control width and decay rate and where ${k_{n}(\varepsilon)}\overset{\Delta}{=}{\left\lbrack {\overset{\infty}{\int\limits_{- \infty}}{\frac{1}{1 + \left( \frac{a}{\varepsilon} \right)^{2n}}d\; a}} \right\rbrack^{- 1}.}$
 30. The apparatus of claim 29 wherein the processing device is further configured to iteratively derive the estimated image value comprises iteratively deriving the reflection coefficient for a given voxel using f or l=1,2, . . . , L $x_{l}^{({m + 1})} = {x_{l}^{(m)} + \frac{G_{l}^{(m)} - {\sigma^{2{(m)}} \cdot {g^{\prime}\left( {x_{l}^{(m)};\varepsilon} \right)}}}{{\sigma^{2{(m)}} \cdot B} + H_{l}}}$ where x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ^(2(m)) is the estimate of the variance of noise in the data at the iterate number (m), B is least upper bound of the second derivative of ${g\left( {a;\varepsilon} \right)} = {{- \log}\frac{k_{n}(\varepsilon)}{1 + \left( \frac{a}{\varepsilon} \right)^{2n}}}$ where ∈ and n are parameters that control width and decay rate for the prior probability density function and where ${k_{n}(\varepsilon)}\overset{\Delta}{=}\left\lbrack {\overset{\infty}{\int\limits_{- \infty}}{\frac{1}{1 + \left( \frac{a}{\varepsilon} \right)^{2n}}d\; a}} \right\rbrack^{- 1}$ and  where ${g^{\prime}\left( {a;\varepsilon} \right)} = \frac{2{n\left( \frac{a}{\varepsilon} \right)}^{{2n} - 1}}{\varepsilon\left( {1 + \left( \frac{a}{\varepsilon} \right)^{2n}} \right)}$ and where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}\;{{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}.}}$
 31. The apparatus of claim 30 wherein the variance of the noise in the data is estimated using a maximum likelihood estimate of noise power.
 32. The apparatus of claim 31 wherein the maximum likelihood estimate of noise power for the iterate number (m) is given by $\sigma^{2{(m)}} = {\frac{1}{K}{{y - {Ax}^{(m)}}}_{2}^{2}}$ where x^((m))

[x₁ ^((m)), x₂ ^((m)), . . . , x_(l) ^((m))].
 33. The apparatus of claim 20 wherein the processing device is configured to calculate the terms by: computing G_(l) ^((m)) by applying a hash-table-based computation to ${q_{i,j}\lbrack k\rbrack} = {\sum\limits_{l \in S_{k}}\; d_{ijl}^{(m)}}$ where d_(ijl) ^((m))=α_(ijl)x_(l) ^((m)) and S_(k)={l=1, 2, . . . , L: n_(ijl)=k} and a_(ijl) represents attenuation of the given radar pulse during travel from an i^(th) transmit location to an l^(th) voxel and back to a j^(th) receiver.
 34. The apparatus of claim 33 wherein the processing device is further configured to compute G_(l) ^((m)) via i=1, 2, . . . , I; j=1, 2, . . . , J; l=1, 2, . . . , L $G_{l}^{(m)} = {\sum\limits_{i = 1}^{I}\;{\sum\limits_{j = 1}^{J}\;{\alpha_{ijl}G_{ijl}^{(m)}}}}$ where G_(ijl)^((m)) = {[w * (y_(ij) − s_(ij)^((m))))[k]}❘_(k = n_(ijl))s_(ij)^((m))[n] = (q_(ij) * w)[n] and where y_(ij)[k] is a k^(th) sample of a radar-return profile associated with an i^(th) transmit location and j^(th) receiver, s_(ij) ^((m))[k] is the m^(th) estimate of a noise-free component of y_(ij)[k], w is a discretized version of the given radar pulse, and n_(ijl) is a discrete time-delay corresponding to rounding the quotient of the time taken by the given radar pulse to travel from a transmitter at the i^(th) transmit location to the l^(th) voxel and back to the j^(th) receiver and a sampling interval, and * denotes discrete-time convolution.
 35. The apparatus of claim 20 wherein processing device is configured to calculate the terms by: computing H_(l) by applying a hash-table-based computation to |S_(k)| where |S_(k)| denotes a number of elements in the set S_(k).
 36. The apparatus of claim 35 wherein the processing device is configured to compute H_(l) via i=1, 2, . . . . , l; j=1, 2, . . . . , J; l=1, 2, . . . , L $H_{l} = {\sum\limits_{i = 1}^{I}\;{\sum\limits_{j = 1}^{J}\;{\alpha_{ijl}^{2}H_{ijl}}}}$ where ${H_{ijl} = {{\left( {\gamma_{ij}*h} \right)\lbrack k\rbrack}❘_{k = n_{\;^{ijl}}}}},{{\gamma_{ij}\lbrack n\rbrack} = {v\;\left( {{\min\left( {{n + M},N} \right)} - {v\left( {\max\left( {0,{n - M}} \right)} \right)}} \right)}},{{v(m)} = {\sum\limits_{k = 0}^{m}\;{S_{k}}}}$ and where h is a discretized version of a squared radar pulse and 2M+1 is a number of non-zero samples in the given radar pulse.
 37. The method of claim 3 wherein the prior probability density function for the unknown reflection coefficients is $\sum\limits_{l = 1}^{L}\;{s\left( {x_{l},\theta} \right)}$ where x_(l) is the estimated image value at an l^(th) voxel, and ${s\left( {a;\lambda} \right)}\overset{\Delta}{=}{{- \log}{\frac{\frac{\lambda}{\log\; 4}}{1 + {\exp\left( {\lambda{a}} \right)}}.}}$
 38. The method of claim 37 wherein the iteratively deriving the estimated image value comprises iteratively deriving the reflection coefficient for a given voxel using f or l=1,2, . . . , L $x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2}\frac{{\lambda exp}\left( {\lambda{x_{l}^{(m)}}} \right)}{{x_{l}^{(m)}}\left( {1 + {\exp\left( {\lambda{x_{l}^{(m)}}} \right)}} \right)}}}$ where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}\;{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}$ where x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ² ^((m)) is an estimation of variance of noise in the data, and $\frac{s^{\prime}\left( {a;\lambda} \right)}{a} = {\frac{{\lambda exp}\left( {\lambda{a}} \right)}{{a}\left( {1 + {\exp\left( {\lambda{a}} \right)}} \right)}.}$
 39. The method of claim 3 wherein a negative log of a summation of the prior probability density function for the unknown reflection coefficients is $\sum\limits_{l = 1}^{L}\;{s\left( {x_{l},\theta} \right)}$ where x_(l) is the estimated image value at an l^(th) voxel, and ${s\left( {a;\lambda} \right)}\overset{\Delta}{=}{{- \log}\frac{\lambda}{2}{{\exp\left( {{- \lambda}{a}} \right)}.}}$
 40. The method of claim 39 wherein the iteratively deriving the estimated image value comprises iteratively deriving the reflection coefficient for a given voxel using f or l=1,2, . . . , L $x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2}\frac{\lambda}{x_{l}^{(m)}}}}$ where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}\;{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}$ where x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ² ^((m)) is an estimation of variance of noise in the data, and $\frac{s^{\prime}\left( {a;\lambda} \right)}{a} = {\frac{\lambda}{a}.}$
 41. The apparatus of claim 21 wherein the processing device is further configured to use a negative log of a summation of the prior probability density function for the unknown reflection coefficients as $\sum\limits_{l = 1}^{L}\;{s\left( {x_{l},\theta} \right)}$ where x_(l) is the estimated image value at an l^(th) voxel, and ${s\left( {a;\lambda} \right)}\overset{\Delta}{=}{{- \log}{\frac{\frac{\lambda}{\log\; 4}}{1 + {\exp\left( {\lambda{a}} \right)}}.}}$
 42. The apparatus of claim 41 wherein the processing device is further configured to iteratively derive the reflection coefficient for a given voxel using f or l=1, 2, . . . , L $x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2}\frac{{\lambda exp}\left( {\lambda{x_{l}^{(m)}}} \right)}{{x_{l}^{(m)}}\left( {1 + {\exp\left( {\lambda{x_{l}^{(m)}}} \right)}} \right)}}}$ where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}\;{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}$ where x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ² ^((m)) is an estimation of variance of noise in the data, and $\frac{s^{\prime}\left( {a;\lambda} \right)}{a} = {\frac{{\lambda exp}\left( {\lambda{a}} \right)}{{a}\left( {1 + {\exp\left( {\lambda{a}} \right)}} \right)}.}$
 43. The apparatus of claim 21 wherein the processing device is further configured to use a negative log of a summation of the prior probability density function for the unknown reflection coefficients as $\sum\limits_{l = 1}^{L}\;{s\left( {x_{l},\theta} \right)}$ where x_(l) is the estimated image value at an l^(th) voxel, and ${s\left( {a;\lambda} \right)}\overset{\Delta}{=}{{- \log}\frac{\lambda}{2}{{\exp\left( {{- \lambda}{a}} \right)}.}}$
 44. The apparatus of claim 43 wherein the processing device is further configured to iteratively derive the reflection coefficient for a given voxel using f or l=1, 2, . . . , L $x_{l}^{({m + 1})} = \frac{G_{l}^{(m)} + {x_{l}^{(m)}H_{l}}}{H_{l} + {\sigma^{2}\frac{\lambda}{x_{l}^{(m)}}}}$ where $H_{l} = {\sum\limits_{k = 1}^{K}\;{r_{k}A_{kl}^{2}}}$ $G_{l}^{(m)} = {\sum\limits_{k = 1}^{K}\;{A_{kl}\left( {y_{k} - \left\lbrack {Ax}^{(m)} \right\rbrack_{k}} \right)}}$ where x_(l) ^((m+1)) is the estimated image value at an l^(th) voxel for iterate number (m+1), x_(l) ^((m)) is the estimated image value at an l^(th) voxel for iterate number (m), σ² ^((m)) is an estimation of variance of noise in the data, and $\frac{s^{\prime}\left( {a;\lambda} \right)}{a} = {\frac{\lambda}{a}.}$ 